Utilizing Structure-rich Features to improve Clustering

Benjamin Schelling, Lena Bauer, Sahar Behzadi Soheil, Claudia Plant

Publications: Contribution to bookContribution to proceedingsPeer Reviewed

Abstract

For successful clustering, an algorithm needs to find the boundaries between clusters. While this is comparatively easy if the clusters are compact and non-overlapping and thus the boundaries clearly defined, features where the clusters blend into each other hinder clustering methods to correctly estimate these boundaries. Therefore, we aim to extract features showing clear cluster boundaries and thus enhance the cluster structure in the data. Our novel technique creates a condensed version of the data set containing the structure important for clustering, but without the noise-information. We demonstrate that this transformation of the data set is much easier to cluster for k-means, but also various other algorithms. Furthermore, we introduce a deterministic initialisation strategy for k-means based on these structure-rich features.
Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2020, Proceedings
Subtitle of host publicationEuropean Conference, ECML PKDD 2020, Ghent, Belgium, September 14–18, 2020, Proceedings, Part I
EditorsFrank Hutter, Kristian Kersting, Jeffrey Lijffjit, Isabel Valera
Place of PublicationCham
PublisherSpringer International Publishing
Pages91-107
Number of pages17
Volume12457
Edition1
ISBN (Electronic)9783030676582
ISBN (Print)978-3-030-67657-5
DOIs
Publication statusPublished - 2021
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2020) - Ghent, Belgium
Duration: 14 Sept 202018 Sept 2020

Publication series

SeriesLecture Notes in Computer Science
ISSN0302-9743

Conference

ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2020)
Country/TerritoryBelgium
CityGhent
Period14/09/2018/09/20

Austrian Fields of Science 2012

  • 102033 Data mining

Keywords

  • ALGORITHM

Cite this