Benchmarking Deep Clustering Algorithms With ClustPy

Collin Leiber, Lukas Miklautz, Claudia Plant, Christian Böhm

Publications: Contribution to bookContribution to proceedingsPeer Reviewed

Abstract

Deep clustering algorithms have gained popularity as they are able to cluster complex large-scale data, like images. Yet these powerful algorithms require many decisions w.r.t. architecture, learning rate and other hyperparameters, making it difficult to compare different methods. A comprehensive empirical evaluation of novel clustering methods, however, plays an important role in both scientific and practical applications, as it reveals their individual strengths and weaknesses. Therefore, we introduce ClustPy, a unified framework for benchmarking deep clustering algorithms, and perform a comparison of several fundamental deep clustering methods and some recently introduced ones. We compare these methods on multiple well known image data sets using different evaluation metrics, perform a sensitivity analysis w.r.t. important hyperparameters and perform ablation studies, e.g., for different autoencoder architectures and image augmentation. To our knowledge this is the first in depth benchmarking of deep clustering algorithms in a unified setting.

Original languageEnglish
Title of host publicationProceedings - 23rd IEEE International Conference on Data Mining Workshops
Subtitle of host publicationICDMW 2023
EditorsJihe Wang, Yi He, Thang N. Dinh, Christan Grant, Meikang Qiu, Witold Pedrycz
PublisherIEEE
Pages625-632
Number of pages8
ISBN (Electronic)9798350381641
ISBN (Print)979-8-3503-8165-8
DOIs
Publication statusPublished - 2023

Austrian Fields of Science 2012

  • 102033 Data mining

Keywords

  • Benchmarking
  • Data Mining
  • Deep Clustering
  • Representation Learning
  • Unsupervised Learning

Fingerprint

Dive into the research topics of 'Benchmarking Deep Clustering Algorithms With ClustPy'. Together they form a unique fingerprint.

Cite this