A Hash Map for Graph Processing Workloads & a Methodology for Transactional Data Structures – CppCon

Posted on January 3, 2022
By digitalmedium1
Leave a comment

https://cppcon.org/
https://github.com/CppCon/CppCon2020
---
Emerging byte-addressable Non-Volatile Memories (NVMs) provide higher densities and data persistence where process state can be recovered after crashes. Graph analytic applications such as search and ranking, pattern matching, and clustering benefit from the large capacity available with persistent memory since the data sets are very large with up to trillions of vertices and edges. A hash map is fundamental for graph analytics because it associatively map keys to values, offering constant time lookup for graph updates. While lock-free persistent hash maps have been developed, most perform poorly due to a heavy use of pointer dereferences. Additionally, it is often desirable to perform a composition of graph updates as a transaction. Although persistent data structures enable applications to rely on persistent data with failure-atomic operations, they lack the ability to allow users to execute a sequence of operations as transactions. Meanwhile, persistent transactional memory (PTM) has been proposed by adding durability to Software Transactional Memory (STM). However, PTM suffers from high performance overheads and low scalability due to false aborts, logging, and ordering constraints on persistence.

In this talk, we present PMap, a persistent concurrent hash map with open addressing and non-blocking parallel resizing to cater to large graph processing workloads. Our PMap achieves lock-freedom using compare_exchange_strong from the C++11 Atomic Operations Library. To provide persistence, our design uses a modified version of the link-and-persist approach. While this approach is traditionally used to persist pointers, our modification enables link-and-persist to be used in-place, persisting keys and values directly. This is crucial, as logs, or separate object allocations may require pointer dereferences, which undermines the cache locality benefits of pure open addressing. The PMap uses the C++11 alignas specifier to ensure data can be aligned and flushed on a per-cacheline basis into persistent memory. To accommodate durable transactions, we present PETRA, a new approach for constructing persistent transactional non-blocking linked data structures. PETRA natively supports transactions, but unlike PTM, relies on the high-level information from the data structure semantics. This gives PETRA unique advantages in the form of high performance and high scalability. We present diagrams detailing the PMap design and PETRA methodology for the C++ programming language, and illustrate how to keep the number of cache line flushes and memory fences low through the use of descriptor objects, which are commonly used in the design of lock-free data structures.

We showcase the performance of our proposed persistent data structures by presenting a live demonstration of PMap and PETRA data structures including a linked list, skiplist, map, and multi-dimensional list. We compare PMap to clevel, OneFile, STL, and PMDK, tested on a micro-benchmark and on files from Recursive Model for graph mining (R-MAT). We compare the PETRA data structures to OneFile, Romulus, and PMDK, tested on a micro-benchmark and the Telecommunication Application Transaction Processing (TATP) benchmark. The live demonstration illustrates how PMap outperforms state-of-the-art alternatives averaging 3122x faster under Intel Optane DC, and PETRA outperforms the state-of-the-art PTMs by an order of magnitude in transactions of size greater than one, and demonstrates superior performance in transactions of size one.

---
Kenneth Lamar

Kenneth Lamar received his Bachelor of Science (2017) from the University of Central Florida. He is currently pursuing his PhD in Computer Science at UCF. His research interests include concurrent data structures and distributed computing. His work is currently focused on HPC scheduling, transactional data structures, and persistent memory.

Christina Peterson

Dr. Christina Peterson is a Postdoctoral Associate at the University of Central Florida. Dr. Peterson completed her dissertation work under the supervision of Dr. Damian Dechev with a focus on correctness and progress guarantees of multiprocessor algorithms and data structures. Her published work includes the development of a durable linearizable correctness tool, concurrent correctness conditions defined in vector space, a transactional correctness tool for abstract data types, a correctness condition specification tool, a framework to verify progress guarantees, and an extension of combinatorial topology theory for wait-free computability to support the instructions Compare-And-Swap and Load-Link-Store-Conditional. She also co-authored a publication that provides a read-uncommitted view for blockchain transactions.

---
Filmed & Edited by Bash Films: http://www.BashFilms.com

The CppCon YouTube Channel Is Sponsored By:
JetBrains : http://jb.gg/cpptools
SonarSource: https://www.sonarsource.com/

Filed under: Science & Technology

A Hash Map for Graph Processing Workloads & a Methodology for Transactional Data Structures – CppCon

Add a Comment Cancel reply

Bronze Sponsors