Tag: CPU

C++ Function Multiversioning in Windows – by Joe Bialek and Pranav Kant – CppCon 2022

Posted on November 26, 2022
By digitalmedium1
Leave a comment

https://cppcon.org/
---

C++ Function Multiversioning in Windows - Joe Bialek and Pranav Kant - CppCon 2022
https://github.com/CppCon/CppCon2022

Original Title: High-performance Load-time Implementation Selection

Programmers often write software that is going to run on a wide variety of machine types. Writing code that is maximally efficient for all target machines can be a challenge. Often, a programmer may want to provide multiple implementations of a particular function, with each implementation optimized for a particular class of machines (e.g., an SSE implementation and an AVX implementation of a function). However, when there are multiple implementations provided, there is often a runtime cost to select the best implementation for the particular target.

Our feature combines compiler and OS facilities to enable no-overhead selection of the optimal implementation of a function. It is a function multi-versioning feature where a programmer can write multiple implementations of the function and specify to the compiler which implementation should be used on which target architectures. The compiler will use this information to generate metadata that it will include in the binary. When the binary is loaded, the OS will use this metadata to fix up references to the function, so that all references refer to the encoded optimal implementation for the current machine.

This feature enables extremely fine-grained function specialization without any overhead from indirect calls, jump tables, test-and-branch checks, etc. The specializations can be based on CPU architecture, model, features supported, or nearly any other characteristic of the current system.

This talk will walk through how this feature works, and how it is similar to existing features (e.g., gcc function multiversioning using ifuncs), and what implementation details compilers and OSes must consider when implementing this feature. We will discuss our implementation of this feature in Visual C++ and Windows 11, and demo the feature in action.
---

Joe Bialek

Joe Bialek is a security engineer in the Microsoft Security Response Center's Vulnerability & Mitigations team. Joe spends his time eliminating vulnerability classes, creating exploit mitigations, and finding security bugs.
---

Pranav Kant

Pranav is a Software Engineer working for Microsoft on Visual C++ Compiler Backend team where he focuses on code generation, linking, and post-link binary level tooling.
---

Videos Filmed & Edited by Bash Films: http://www.BashFilms.com
YouTube Channel Managed by Digital Medium Ltd https://events.digital-medium.co.uk

#cppcon #programming #function

Filed under: UncategorizedTagged with: cppfunction, CPU, gccfunctions, maintainability, multiversioning, programming, softwarearchitecture, Windows

Scalable and Low Latency Lock-free Data Structures in C++ – by Alexander Krizhanovsky – CppCon 2022

Posted on November 24, 2022
By digitalmedium1
Leave a comment

https://cppcon.org/
---

Scalable and Low Latency Lock-free Data Structures in C++ - Alexander Krizhanovsky - CppCon 2022
https://github.com/CppCon/CppCon2022

Imagine that your program uses many threads, which insert and lookup millions times per second in a large data structure like std::map or std::unordered_map. Typically, you have to switch to a lock-free data structure for this task. Lock-free approaches perfectly scale data structures for multi-core systems, but hash tables and trees need some reorganization as more and more items are inserted and these reorganizations are hard to make lock-free. But the problem isn't only in contention and we also need to efficiently work with memory to develop a high performance data structure for the modern hardware. Cache conscious data structures address the problem by efficient usage of CPU caches and reducing the number of accesses to the main memory.

This talk makes a quick survey over several lock-free (primarily variations of hash table) and cache conscious (mostly trees) data structures. We'll discuss the best use cases for the data structures and their limitations. Besides performance in average cases, we'll also focus on worst case scenarios, which may introduce high tail latency on large busy systems. Tail, or high percentile, latency is a severe problem since it may reach significant values and the small percent of problem cases can be not so small in absolute values, e.g. if you service 1M users, then only 0.1% of them experiencing high processing time is a problem.

The main part of the talk is a step by step C++ implementation of Hash Trie, a hybrid lock-free cache conscious data structure, which provides good access time in average and worst case. We'll do microbenchmarks of the data structure and compare it with other data structures.

You'll learn about:
* when standard containers and locking mechanisms aren't enough
* several advanced data structures: split ordered lists and other variations of lock-free hash tables, tries (partricia trees) and hybrid data structures
* x86-64 memory ordering and cache hierarchy, operating system preemption and how to employ all the knowledge to implement a very fast data structure
* gotchas of data structures benchmarking, such as keys distribution, latency vs throughput, worst cases and so on
* an open source lock-free cache conscius Hash Trie implementation
---

Alexander Krizhanovsky

Alexander is the CEO of Tempesta Technologies, Inc., and is the architect of Tempesta FW, a high performance open source Linux application delivery controller. Alexander is responsible for the design and performance of several products in the areas of network traffic processing and databases. He designed the core architecture of a Web application firewall, mentioned in the Gartner Magic Quadrant '15, and the MariaDB temporal data tables.

Alexander gave talks at Netdev, SCALE, Linux Conf Australia, MariaDB user conferences, All Things Open, FOSDEM, Percona Live, IBM CASCON, and many other conferences. Alexander is also the author of a very fast lock-free MPMC ring buffer queue, published by the Linux Journal in 2013.
__

Videos Streamed & Edited by Digital Medium: http://online.digital-medium.co.uk

#cppcon #programming #datastructures

Filed under: UncategorizedTagged with: code, CPU, datastructures, hashtables, hashtrees, programming

High Speed Query Execution with Accelerators and C++ – Alex Dathskovsky – CppCon 2022

Posted on November 14, 2022
By digitalmedium1
Leave a comment

https://cppcon.digital-medium.co.uk/tag/cppcon/">cppcon.org/
---

High Speed Query Execution with Accelerators and C++ - Alex Dathskovsky - CppCon 2022
https://github.com/CppCon/CppCon2022

Large-scale analytics of structures and semi-structured data has become a pivotal workload in many computing domains, from science, through finance, to social networks. Big data analytics has proven to be highly CPU-bound and requires immense compute resources. Yet unlike the deep learning domain that has gone through a couple of hardware acceleration cycles, big data analytics is still running on stock CPUs. A fundamental reason for this lack of hardware acceleration is that big data analytics is much more computational diverse than deep learning. As a result, a hardware accelerator for big data analytics requires careful balancing of dedicated hardware acceleration with a programmable and flexible fabric. Here at Speedata we are developing the first hardware accelerator for big data analytics. Our SoC relies on a novel massively parallel, non-von Neumann processor coupled with dedicated hardware accelerators. But what good is a hardware accelerator without an optimized software stack? This talk will discuss how we combine C++ and Python to dynamically generate custom query code and compile it for our massively parallel processor. We will discuss are aggressive metacppcon.digital-medium.co.uk/tag/programming/">programming framework that stretches C++ boundaries to the cutting edge. We will further discuss how our code is created, some of the tricks that we use to generate our accelerated code, and why C++17 and C++20 is one of the most important aspects for our development.
---

Alex Dathskovsky

Alex has over 16 years of software development experience, working on systems, low-level generic tools and high-level applications. Alex has worked as an integration/software developer at Elbit, senior software developer at Rafael, technical leader at Axxana, Software manager at Abbott Israel and now a group manager a technical manager at Speedata.io an Exciting startup the will change Big Data and analytics as we know it .On His current Job Alex is developing a new CPU/APU system working with C++20, Massive metacppcon.digital-medium.co.uk/tag/programming/">programming and development of LLVM to create the next Big thing for Big Data.

Alex is a C++ expert with a strong experience in template meta-cppcon.digital-medium.co.uk/tag/programming/">programming. Alex also teaches a course about the new features of modern C++, trying to motivate companies to move to the latest standards.
__

Videos Filmed & Edited by Bash Films: http://www.BashFilms.com
YouTube Channel Managed by Digital Medium Ltd https://events.digital-medium.co.uk

#cppcon.digital-medium.co.uk/tag/cppcon/">cppcon #cppcon.digital-medium.co.uk/tag/programming/">programming #cpp

Filed under: UncategorizedTagged with: Cpp17, CPU, presentation, programming, query, queryexecution