Performance testing of an algorithm is necessary to ascertain its applicability in real data and in turn, to evolve software. Clustering of a data set could be either fuzzy (having vague boundaries among the clusters) or crisp (having well-defined fixed boundaries) in nature. The present work is focused on the performance measure of some similarity-based fuzzy clustering algorithms, where three methods and each method having three different approaches are developed. In the first method, cluster centers are decided based on the minimum of entropy (probability) values of different data points [10]. In the second method, cluster centers are selected based on the maximum of total similarity values of data points and in the third method, a ratio of dissimilarity to similarity is considered to determine the cluster centers. Performances of these methods and approaches are compared on three standard data sets, such as IRIS, WINES, and OLITOS. Experimental results show that entropy-based method is able to generate better quality clusters but at the cost of little more computations. Finally, the best sets of clusters are mapped to 2-D using a self-organizing map (SOM) for visualization.
Received on October 3, 2005
References: 14