Sunday, 10 September 2017

Hierarchical Clustering (Bottom-Up Clustering) & Performance Parameters


K-means requires to set numbers of cluster beforehand. When we want to know more where clusters have merged and not want to specify numbers of clusters beforehand, we use hierarchical clustering (Agglomerative clustering) . 

It starts with calculating distance between every pair of objects. Then merge the nearest pair of objects in a cluster. Thus new clusters are created. Again distance between all new clusters are calculated. Again they are merged based on minimal distance between the objects. This process goes on until a single cluster is obtained.


Bottom-up approach of Hierarchical Clustering


This is different from K-means in following ways-
  1. Computationally expensive as distance between every point is calculated initially.
  2. It might give locally optimised clusters unlike k means that will give globally optimised clusters.
  3. Resulting clusters in hierarchical clustering may differ based on linkage-single linkage or complete linkage. Single linkage can also suffer with chaining of clusters.
  4. When there are many objects, reading a tree becomes difficult.
 Like k-means, hierarchical clustering can not handle categorical variables, though one hot encoding can help to some extent but as the variables increase, clustering fails.In those scenarios we can use K-mode clustering. 


Performance parameters of clustering-

Dunn’s Index- The Dunn Index assesses the goodness of a clustering, by measuring the maximal diameter of clusters and relating it to the minimal distance between clusters. It can be calculated by- 

minimum distance between centre of cluster/ maximum diameter of a cluster.

Jaccard Index- {A intersection B}/ {A union B} , lower values are desirable.


Silhouette Distance-  it's calculated for a point in clusters, it depends on 2 parameters-  1) a(i) ; average distance of i’th point to other points in it’s cluster. 2) b(i); distance of i’th point to nearest cluster. a(i) can be said as compactness and b(i) can be said as separation.

{b(i)- a(i)}/max{b(i),a(i)}
It lies between 1 and -1. Higher value is desirable.0 indicated that point.

BIC values can also be seen as clustering fit. These indexes are relative and can be used to compare clustering algorithms but there is no cut off that would say that now clustering is perfect, though accuracy of clustering can be validated if we have some pre-defined classes.

Clustering are highly related to identify abnormalities- See how these are related-

Anomaly detection techniques

other article that you may like-











No comments:

Post a Comment