Scalable model-based cluster analysis using clustering features
Document Type
Journal article
Source Publication
Pattern Recognition
Publication Date
5-1-2005
Volume
38
Issue
5
First Page
637
Last Page
649
Keywords
Cluster analysis, Clustering feature, Convergence, Data mining, Expectation maximization, Gaussian mixture model, Scalable
Abstract
We present two scalable model-based clustering systems based on a Gaussian mixture model with independent attributes within clusters. They first summarize data into sub-clusters, and then generate Gaussian mixtures from their clustering features using a new algorithm - EMACF. EMACF approximates the aggregate behavior of each sub-cluster of data items in the Gaussian mixture model. It provably converges. The experiments show that our clustering systems run one or two orders of magnitude faster than the traditional EM algorithm with few losses of accuracy.
DOI
10.1016/j.patcog.2004.07.012
Print ISSN
00313203
E-ISSN
18735142
Publisher Statement
Copyright © 2005 Pattern Recognition Society
Access to external full text or publisher's version may require subscription.
Full-text Version
Publisher’s Version
Language
English
Recommended Citation
Jin, H., Leung, K.-S., Wong, M.-L., & Wu, Z.-B. (2005). Scalable model-based cluster analysis using clustering features. Pattern Recognition, 38(5), 637-649. doi: 10.1016/j.patcog.2004.07.012