Cache-oblivious High-performance Similarity Join

Martin Perdacher, Claudia Plant, Christian Böhm

Publications: Contribution to bookContribution to proceedingsPeer Reviewed

Abstract

A similarity join combines vectors based on a distance condition. Typically, such algorithms apply a filter step (by indexing or sorting) and then refine pairs of candidate vectors. In this paper, we propose to refine the pairs in an order defined by a space-filling curve which dramatically improves data locality. Modern multi-core microprocessors are supported by a deep memory hierarchy including RAM, various levels of cache, and registers. The space-filling curve makes our proposed algorithm cache-oblivious to fully exploit the memory hierarchy and to reach the possible peak performance of a multi-core processor. Our novel space-filling curve called Fast General Form (FGF) Hilbert solves a number of limitations of well-known approaches: it is non-recursive, it is not restricted to traverse squares, and it has a constant time and space complexity. As we demonstrate the easy transformation from conventional into cache-oblivious loops we believe that many algorithms for complex joins and other database operators could be transformed systematically into cache-oblivious SIMD and MIMD parallel algorithms.
Original languageEnglish
Title of host publicationProceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019.
Pages87-104
Number of pages18
ISBN (Electronic)9781450356435
DOIs
Publication statusPublished - 2019
EventACM SIGMOD Conference: ACM Special Interest Group on Management of Data - Beurs van Berlage, Amsterdam, Netherlands
Duration: 30 Jun 20195 Jul 2019
http://sigmod2019.org/

Conference

ConferenceACM SIGMOD Conference
Abbreviated titleSIGMOD
Country/TerritoryNetherlands
CityAmsterdam
Period30/06/195/07/19
Internet address

Austrian Fields of Science 2012

  • 102010 Database systems

Keywords

  • ALGORITHM
  • CURVES
  • Cache-oblivious
  • Epsilon Grid Order
  • Hilbert-curve
  • INDEX
  • METRIC-SPACES
  • NEAREST-NEIGHBOR
  • SET
  • Similarity Join
  • Space-filling curve

Fingerprint

Dive into the research topics of 'Cache-oblivious High-performance Similarity Join'. Together they form a unique fingerprint.
  • Improved Data Locality Using Morton-order Curve on the Example of LU Decomposition

    Perdacher, M., Plant, C. & Böhm, C., 10 Dec 2020, 2020 IEEE International Conference on Big Data: Dec 10 - Dec 13, 2020 • Virtual Event. Wu, X., Jermaine, C., Xiong, L., Kotevska, O., Lu, S., Xu, W., Aluru, S., Zhai, C., Al-Masri, E., Chen, Z. & Saltz, C. (eds.). IEEE, p. 351-360 10 p.

    Publications: Contribution to bookContribution to proceedingsPeer Reviewed

Cite this