Clustering can be defined as the grouping of data points
based on some commonality or similarity between the points.

One of the simplest methods is K-means clustering. In this
method, the number of clusters is initialized and the center of each of the
cluster is randomly chosen. The Euclidean distance between each data point and
all the center of the clusters is computed and based on the minimum distance
each data point is assigned to certain cluster. The new center for the cluster
is defined and the Euclidean distance is calculated. This procedure iterates
till convergence is reached.

Let’s see how to generate some random data points and some
random cluster points

MATLAB CODE:

%Generate sample data
points and assign random centre for each cluster

%Number of data points

sz=100;

sz1=250;

X = random('unid',sz1,[sz 1]); %Value

Xp = random('unid',sz1,[sz 1]); %Position

%Number of clusters

c=6;

V=random('unid',sz1,[c 1]); %Value

Vp=random('unid',sz1,[c 1]); %Position

figure,plot(Xp,X,'*',Vp,V,'r+');title('Data points
and the initial Cluster centers');

**Explanation:**

100 data points are generated and the number of clusters are
assumed to be 6 and 6 random cluster points are generated.

**Group the data points:**

**MATLAB CODE:**

%Preallocate the vectors

V1=zeros(size(V));

Vp1=zeros(size(Vp));

flag=1;

while flag==1

%Find the euclidean
distance between the data points and all the center of

%the clusters

J=sqrt(abs(repmat(Xp,[1 c])-repmat(Vp,[1
sz])').^2+abs(repmat(X,[1 c])-repmat(V,[1 sz])').^2);

%Find the minimum distance
between the data point and the cluster

[mv,Gpos]=min(J,[],2);

CGroup=zeros([sz c]);

colr=colormap(jet(c));

figure(3),

for i = 1:c

Temp = find(Gpos==i);

CGroup(1:numel(Temp),i)=Temp;

V1(i,:)=mean(X(Temp));

Vp1(i,:)=mean(Xp(Temp));

Pos=ones(numel(Temp)*2,1)*Vp1(i);

Pos(2:2:end)=Xp(Temp);

Value=ones(numel(Temp)*2,1)*V1(i);

Value(2:2:end)=X(Temp);

%Define the new centre for each
cluster

plot(Pos,Value,'Color',colr(i,:),'LineStyle','-','Marker','o');hold on;

plot(Vp1(i),V1(i),'k+');

end

hold off;

Diffv=abs(V-V1);

DiffVp=abs(Vp-Vp1);

%Iterate the process till
there is no change in the cluster position

if(Diffv < 1)

flag=0;

else

V=V1;

Vp=Vp1;

end

end

figure,plot(Xp,X,'*',Vp,V,'g+');title('Data points
and the Final Cluster centers');

Fig. The new position(red circle)
of the clusters after the final iteration.

**Explanation:**

In the above figure, the data
points are represented in blue color stars and the cluster centers are
represented in red color cross shape.

Let’s consider one particular
data point and all the cluster centers.

Data point position X = 13, Y =
20

Cluster 1 position X = 8,
Y = 19

Cluster 2 position X = 13, Y = 15

**Step 1:**Find the Euclidean Distance:

Find the Euclidean distance(D1)
between data point and the cluster 1 similarly, find the Euclidean distance(D2)
between data point and the cluster 2

Distance D1 = sqrt((13-8).^2+(20-19).^2))
= 5.0990

Distance D2 =
sqrt((13-13).^2+(20-15).^2))= 5.0000

**Step 2:**Find the minimum and assign the data point to a cluster

Now the minimum distance among
the two results is for the cluster 2.

So the data point with
(X,Y)=(13,20) is assigned to the cluster/group 2.

**Step 3:**Perform the step 1 and step 2 for all the data points and assign group accordingly.

**Step 4:**Assign a new position for the clusters based on the clustering done.

Find the average position of the newly
assigned data points to a particular cluster and use that average as the new
position for the cluster.

**Step 5:**Iterate this procedure till the position of the clusters are unchanged.

Number of clusters = 3

Number of clusters = 15

## 0 comments:

## Enjoyed Reading? Share Your Views