Abstract
Deep clustering algorithms have gained popularity for clustering complex, large-scale data sets, but getting started is difficult because of numerous decisions regarding architecture, optimizer, and other hyperparameters. Theoretical foundations must be known to obtain meaningful results. At the same time, ease of use is necessary to get used by a broader audience. Therefore, we require a unified framework that allows for easy execution in diverse settings. While this applies to established clustering methods like k-Means and DBSCAN, deep clustering algorithms lack a standard structure, resulting in significant programming overhead. This complicates empirical evaluations, which are essential in both scientific and practical applications. We present a solution to this problem by providing a theoretical background on deep clustering as well as practical implementation techniques and a unified structure with predefined neural networks. For the latter, we use the Python package ClustPy. The aim is to share best practices and facilitate community participation in deep clustering research.
Original language | English |
---|---|
Title of host publication | CIKM 2023 - Proceedings of the 32nd ACM International Conference on Information and Knowledge Management |
Subtitle of host publication | Proceedings of the 32nd ACM International Conference on Information and Knowledge Management |
Publisher | ACM |
Pages | 5208-5211 |
Number of pages | 4 |
ISBN (Electronic) | 9798400701245 |
ISBN (Print) | 979-8-4007-0124-5 |
DOIs | |
Publication status | Published - 21 Oct 2023 |
Austrian Fields of Science 2012
- 102033 Data mining
Keywords
- data mining
- deep clustering
- representation learning
- unsupervised learning