Publication Status

Published

Document Type

Conference paper

Department / Unit

Department of Computing and Decision Sciences

Publication Date

1-1-2002

Language

English

Abstract

Scalable cluster analysis addresses the problem of processing large data sets with limited resources, e.g., memory and computation time. A data summarization or sampling procedure is an essential step of most scalable algorithms. It forms a compact representation of the data. Based on it, traditional clustering algorithms can process large data sets efficiently. However, there is little work on how to effectively perform cluster analysis on data summaries. From the principle of the general expectation-maximization algorithm, we propose a model-based clustering algorithm to make better use of these data summaries in this paper. The proposed EMACF (Expectation-Maximization Algorithm on Clustering Features) algorithm employs data summary features including weight, mean, and variance explicitly. We prove that EMACF converges to a local maximum likelihood value. The computation time of EMACF is linear with the number of data summaries instead of the number of data items, and thus can be integrated with any efficient data summarization procedure to construct a scalable clustering algorithm.

Fulltext file version

Accepted author manuscript

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Event Title

Proceedings of the Second International Workshop on Intelligent Systems Design and Applications

Pure ID

13689250

Pure UUID

153d36df-c0f0-495c-baa3-382d00a3da7a

Download

COinS

staff_fulltext

An Expectation-Maximization Algorithm Working on Data Summary

Publication Status

Document Type

Department / Unit

Publication Date

Language

Abstract

Fulltext file version

Creative Commons License

Event Title

Pure ID

Pure UUID

Search

Browse

Author Corner

Links

staff_fulltext

An Expectation-Maximization Algorithm Working on Data Summary

Authors

Publication Status

Document Type

Department / Unit

Publication Date

Language

Abstract

Fulltext file version

Creative Commons License

Event Title

Pure ID

Pure UUID

Share

Search

Browse

Author Corner

Links