Abstract
Center-based clustering techniques are fundamental to many real-world applications such as data summarization and social network analysis. In this work, we study the problem of fairness aware k-center clustering over large datasets. We are given an input dataset comprising a set of n points, where each point belongs to a specific demographic group characterized by a protected attribute, such as race or gender. The goal is to identify k clusters such that all clusters have considerable representation from all groups and the maximum radius of these clusters is minimized. The majority of the prior techniques do not scale beyond 100K points for k = 50. To address the scalability challenges, we propose an efficient 2-round algorithm for the MapReduce setting that is guaranteed to be a 9-approximation to the optimal solution. Additionally, we develop a 2-pass streaming algorithm that is efficient and has a low memory footprint. These theoretical results are complemented with an empirical evaluation on million-scale datasets, demonstrating that our techniques are effective to identify high-quality fair clusters and efficient as compared to the state-of-the-art.
Original language | English |
---|---|
Title of host publication | WWW 2022 - Proceedings of the ACM Web Conference 2022 |
Editors | Frédérique Laforest, Raphael Troncy, Elena Simperl |
Place of Publication | New York, NY |
Publisher | Association for Computing Machinery (ACM) |
Pages | 1414-1422 |
Number of pages | 9 |
ISBN (Electronic) | 9781450390965 |
DOIs | |
Publication status | Published - Apr 2022 |
Event | The Web Conf 2022 - Virtual Online, Lyon, France Duration: 25 Apr 2022 → 29 Apr 2022 |
Conference
Conference | The Web Conf 2022 |
---|---|
Country/Territory | France |
City | Lyon |
Period | 25/04/22 → 29/04/22 |
Austrian Fields of Science 2012
- 102031 Theoretical computer science
Keywords
- disparate impact
- fairness
- k-center clustering