TY - GEN
T1 - Utilizing Structure-rich Features to improve Clustering
AU - Schelling, Benjamin
AU - Bauer, Lena
AU - Behzadi Soheil, Sahar
AU - Plant, Claudia
PY - 2021
Y1 - 2021
N2 - For successful clustering, an algorithm needs to find the boundaries between clusters. While this is comparatively easy if the clusters are compact and non-overlapping and thus the boundaries clearly defined, features where the clusters blend into each other hinder clustering methods to correctly estimate these boundaries. Therefore, we aim to extract features showing clear cluster boundaries and thus enhance the cluster structure in the data. Our novel technique creates a condensed version of the data set containing the structure important for clustering, but without the noise-information. We demonstrate that this transformation of the data set is much easier to cluster for k-means, but also various other algorithms. Furthermore, we introduce a deterministic initialisation strategy for k-means based on these structure-rich features.
AB - For successful clustering, an algorithm needs to find the boundaries between clusters. While this is comparatively easy if the clusters are compact and non-overlapping and thus the boundaries clearly defined, features where the clusters blend into each other hinder clustering methods to correctly estimate these boundaries. Therefore, we aim to extract features showing clear cluster boundaries and thus enhance the cluster structure in the data. Our novel technique creates a condensed version of the data set containing the structure important for clustering, but without the noise-information. We demonstrate that this transformation of the data set is much easier to cluster for k-means, but also various other algorithms. Furthermore, we introduce a deterministic initialisation strategy for k-means based on these structure-rich features.
KW - ALGORITHM
UR - http://www.scopus.com/inward/record.url?scp=85103282606&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-67658-2_6
DO - 10.1007/978-3-030-67658-2_6
M3 - Contribution to proceedings
SN - 978-3-030-67657-5
VL - 12457
T3 - Lecture Notes in Computer Science
SP - 91
EP - 107
BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2020, Proceedings
A2 - Hutter, Frank
A2 - Kersting, Kristian
A2 - Lijffjit, Jeffrey
A2 - Valera, Isabel
PB - Springer International Publishing
CY - Cham
T2 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2020)
Y2 - 14 September 2020 through 18 September 2020
ER -