INESC-ID   Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
-
technology from seed

kdbio

Knowledge Discovery and Bioinformatics
Inesc-ID Lisboa
Home
 
 

Clustering and Combination of Clustering Ensembles

11/14/2003 - 13:30
11/14/2003 - 14:30
Etc/GMT

Different clustering algorithms will, in general, produce different data partitions when applied to the same data set. On this talk I address the problem of robust clustering as a problem of combining data partitions (forming a clustering ensemble) produced by multiple clusterings. I propose and analyze a voting mechanism on pair wise associations for combiningdata partitions, based on the concept of evidence accumulation. The evidence accumulation method is applied to the combination of "weak" clusterers, using the K-means to produce clustering ensembles. Experimental results show the ability of the technique to identify arbitrarily shaped and sized clusters. Formulated under an information-theoretical framework, and taking consistency and robustness as key features, I then define objective functions and optimality criteria to evaluate a clustering combination technique; mutual information is the underlying concept, used in the definition of quantitative measures of agreement between data partitions; robustness is assessed by variance analysis based on bootstrapping. It is shown that the evidence accumulation technique attempts to optimize the given criteria, although optimality is not ensured in all situations.