Scalable model-based cluster analysis using clustering features

Document Type

Journal article

Source Publication

Pattern Recognition

Publication Date

5-1-2005

Volume

38

Issue

5

First Page

637

Last Page

649

Keywords

Cluster analysis, Clustering feature, Convergence, Data mining, Expectation maximization, Gaussian mixture model, Scalable

Abstract

We present two scalable model-based clustering systems based on a Gaussian mixture model with independent attributes within clusters. They first summarize data into sub-clusters, and then generate Gaussian mixtures from their clustering features using a new algorithm - EMACF. EMACF approximates the aggregate behavior of each sub-cluster of data items in the Gaussian mixture model. It provably converges. The experiments show that our clustering systems run one or two orders of magnitude faster than the traditional EM algorithm with few losses of accuracy.

DOI

10.1016/j.patcog.2004.07.012

Print ISSN

00313203

E-ISSN

18735142

Publisher Statement

Copyright © 2005 Pattern Recognition Society

Access to external full text or publisher's version may require subscription.

Full-text Version

Publisher’s Version

Language

English

Recommended Citation

Jin, H., Leung, K.-S., Wong, M.-L., & Wu, Z.-B. (2005). Scalable model-based cluster analysis using clustering features. Pattern Recognition, 38(5), 637-649. doi: 10.1016/j.patcog.2004.07.012

Share

COinS