Fair k-Center Clustering in MapReduce and Streaming Settings

Suman Bera, Syamantak Das, Sainyam Galhotra, Sagar Kale

Veröffentlichungen: Beitrag in BuchBeitrag in KonferenzbandPeer Reviewed


Center-based clustering techniques are fundamental to many real-world applications such as data summarization and social network analysis. In this work, we study the problem of fairness aware k-center clustering over large datasets. We are given an input dataset comprising a set of n points, where each point belongs to a specific demographic group characterized by a protected attribute, such as race or gender. The goal is to identify k clusters such that all clusters have considerable representation from all groups and the maximum radius of these clusters is minimized. The majority of the prior techniques do not scale beyond 100K points for k = 50. To address the scalability challenges, we propose an efficient 2-round algorithm for the MapReduce setting that is guaranteed to be a 9-approximation to the optimal solution. Additionally, we develop a 2-pass streaming algorithm that is efficient and has a low memory footprint. These theoretical results are complemented with an empirical evaluation on million-scale datasets, demonstrating that our techniques are effective to identify high-quality fair clusters and efficient as compared to the state-of-the-art.

TitelWWW 2022 - Proceedings of the ACM Web Conference 2022
Redakteure*innenFrédérique Laforest, Raphael Troncy, Elena Simperl
ErscheinungsortNew York, NY
Herausgeber (Verlag)Association for Computing Machinery (ACM)
ISBN (elektronisch)9781450390965
PublikationsstatusVeröffentlicht - Apr. 2022
VeranstaltungThe Web Conf 2022 - Virtual Online, Lyon, Frankreich
Dauer: 25 Apr. 202229 Apr. 2022


KonferenzThe Web Conf 2022

ÖFOS 2012

  • 102031 Theoretische Informatik