RandomLink - Avoiding Linkage-Effects by employing Random Effects for Clustering

Benjamin Schelling, Gert Sluiter, Claudia Plant

Publications: Contribution to bookContribution to proceedingsPeer Reviewed

Abstract

We present here a new parameter-free clustering algorithm that does not impose any assumptions on the data. Based solely on the premise that close data points are more likely to be in the same cluster, it can autonomously create clusters. Neither the number of clusters nor their shape has to be known. The algorithm is similar to SingleLink in that it connects clusters depending on the distances between data points, but while SingleLink is deterministic, RandomLink makes use of random effects. They help RandomLink overcome the SingleLink-effect (or chain-effect) from which SingleLink suffers as it always connects the closest data points. RandomLink is likely to connect close data points but is not forced to, thus, it can sever chains between clusters. We explain in more detail how this negates the SingleLink-effect and how the use of random effects helps overcome the stiffness of parameters for different distance-based algorithms. We show that the algorithm principle is sound by testing it on different data sets and comparing it with standard clustering algorithms, focusing especially on hierarchical clustering methods.
Original languageEnglish
Title of host publicationDatabase and Expert Systems Applications. DEXA 2020
Subtitle of host publicationProceedings, Part I
EditorsSven Hartmann, Josef Küng, Gabriele Kotsis, Ismail Khalil, A Min Tjoa
Place of PublicationCham
PublisherSpringer International Publishing
Pages217-232
Number of pages16
Volume12391
Edition1
ISBN (Electronic)978-3-030-59003-1
DOIs
Publication statusPublished - 14 Sept 2020
Event31st International Conference on Database and Expert Systems Applications (DEXA 2020) - Bratislava, Slovakia
Duration: 14 Sept 202017 Sept 2020

Publication series

SeriesLecture Notes in Computer Science
ISSN0302-9743

Conference

Conference31st International Conference on Database and Expert Systems Applications (DEXA 2020)
Country/TerritorySlovakia
CityBratislava
Period14/09/2017/09/20

Austrian Fields of Science 2012

  • 102033 Data mining

Keywords

  • EFFICIENT ALGORITHM

Cite this