Tag: SIMD

Implementing Understandable World Class Hash Tables in C++ – by Eduardo Madrid & Scott Bruce CppCon 2022

https://cppcon.org/
---

A Success Story in Implementing Understandable World Class Hash Tables in Cpp - Eduardo Madrid and Scott Bruce - CppCon 2022
https://github.com/CppCon/CppCon2022

We present a success story about implementing one of the most important data structures, hash tables, with world class performance.

C++ allowed maintaining the Robin Hood hash table invariant (abbreviated RH) with the performance of parallel computation, even using only normal integer operations:

1. The execution costs of maintaining this invariant were minimized
2. The benefits are realized beyond other implementations
3. The actual code is easier to understand than alternatives in the performance S-tier

RH is a powerful invariant that allows implementations to unlock non-obvious and substantial benefits including cache locality.

Our solution for this challenge is to use parallel computation techniques that require only normal integer processor operations, and can be scaled up to "SIMD" operations (SIMD means "Single Instruction Multiple Data").

The parallel programming abstraction layer is composable, programmer understandable, and friendly to the optimizing compiler. We will prove these assertions with real source code, benchmarks and Compiler Explorer disassembly.

This will illustrate how specifics of RH and the scalable SIMD-based implementation gives not-seen-before improvements: using the bits devoted to "Probe Sequence Length" not just for cache locality but also reducing false matches exponentially.

These examples gives us confidence that fully realizing the powers of this invariant will outperform alternative schemes.
---

Eduardo Madrid

Tech Lead at Snap, Inc., helping performance on augmented reality of the Snapchat app. Author of open source libraries and components such as the Zoo type-erasure framework. Prior experience at Fintech, including automated trading. Several times presenter at CPPCon, C++ Now, C++ London

Scott Bruce

Scott has been writing software professionally for 20+ years.
He's been entranced by performance and efficiency at least 15 of those 20+ years.
He worked in distributed systems, Search, Adwords, Adsense and ML at Google for 13 years. He worked 4.5 years at Snapcht, on monetization systems, performance, and advanced mobile telemetry systems. He is currently an engineer at Rockset, working on distributed system performance and real time analytics. Presenter of a production software talk at Google, Snap, UCLA, USC, Caltech and others over the last ten plus years.
---

Videos Filmed & Edited by Bash Films: http://www.BashFilms.com
YouTube Channel Managed by Digital Medium Ltd https://events.digital-medium.co.uk

#cppcon #programming #hashtable

Filed under: UncategorizedTagged with: , , , ,

Fast C++ by using SIMD Types with Generic Lambdas and Filters – Andrew Drakeford – CppCon 2022

https://cppcon.digital-medium.co.uk/tag/cppcon/">cppcon.org/
---

Fast C++ by using SIMD Types with Generic Lambdas and Filters - Andrew Drakeford - CppCon 2022
https://github.com/CppCon/CppCon2022

We show that very high-performance code can be generated when generic lambdas are instantiated with vectorized numeric types ( SIMD wrappers ). Trivial lambdas can be used to create STL-like functionality but with a performance as high as 10-20 G Flops which is competitive with the standard libraries and some commercial libraries. A small framework that supports memory management, alignment and padding of vectors is used to support this style of cppcon.digital-medium.co.uk/tag/programming/">programming. It applies users’ lambdas and filters over vector-like objects.

The performance of trivial implementations of some STL-like functions, memcopy , inner_product , sum of squared values, max_element, and accumulate ( with error correction) is investigated. In some cases, the STL generates scalar instructions and performs quite poorly when compared with the lambda generated code which is branchless and makes full use of the SIMD instructions.

Branching can be a problem for vectorized code. We explore cases where the branches have light, medium, and heavy calculation loads and different frequencies of being traversed. With frequent branching, we find that select operations are useful for handling conditional constants. This is also true for conditionals with lightweight and middleweight expressions in their branches. Our compound operation, transformSelect computes both branches and conditionally blends the in-register results appears to be quite a useful tool.

When heavy branches are traversed infrequently, we can filter to a value-based contiguous view and then perform a transform efficiently using vectorized instructions (filterTransform). In the limiting case where all branches have very heavy compute when compared to the cost of filtering, it is best to filter the cases to separate contiguous regions of memory and then apply the vectorized algorithms so that all the registers lanes can be used in the calculation.

We illustrate this with an example of writing a vectorized version of the inverse cumulative normal distribution function. We use VC++, Intel 2022, and clang compilers and compare the performance of different implementation approaches on silver and gold/W Xeons with the same function in Intel’s short vector math library (SVML).
---

Andrew Drakeford

A Physics PhD who started developing C++ applications in the early 90's at British Telecom labs. For the last two decades he has worked in finance developing calculation libraries and trading systems in C++.
---

Videos Streamed & Edited by Digital Medium: http://online.digital-medium.co.uk

#cppcon.digital-medium.co.uk/tag/cppcon/">cppcon #cppcon.digital-medium.co.uk/tag/programming/">programming #cpp

Filed under: UncategorizedTagged with: , , , ,

Optimizing Binary Search – Sergey Slotin – CppCon 2022

https://cppcon.org/
---

Optimizing Binary Search - Sergey Slotin - CppCon 2022
https://github.com/CppCon/CppCon2022

People generally don’t get very excited over 5-10% performance improvements in some databases. Yes, this is what software engineers are paid for, but these types of optimizations tend to be too intricate and system-specific to be readily generalized to other software.

Instead, the most fascinating showcases of performance engineering are multifold improvements of textbook algorithms: the kinds that everybody knows and deemed so simple that it would never even occur to try to optimize them in the first place. These optimizations are simple and instructive and can very much be adopted elsewhere — and they are surprisingly not as rare as you would think.

In this talk, we will focus on one such fundamental algorithm — binary search — and discuss several ways of making it faster using branchless programming, memory layout optimization, and SIMD instructions, and eventually develop an algorithm that is up to ~15x faster than std::lower_bound.
---

Sergey Slotin
Sergey Slotin is the author of Algorithms for Modern Hardware (en.algorithmica.org/hpc).
---

Videos Streamed & Edited by Digital Medium: http://online.digital-medium.co.uk

#cppcon #programming #binarysearch

Filed under: UncategorizedTagged with: , , , , ,

Optimizing A String Class for Computer Graphics in Cpp – by Zander Majercik & Morgan McGuire – CppCon 22

https://cppcon.digital-medium.co.uk/tag/cppcon/">cppcon.org/
---

SIMDString for Computer Graphics in C++ - Zander Majercik and Morgan McGuire - CppCon 2022
https://github.com/CppCon/CppCon2022

Video games and other real-time 3D graphics use a surprising number of C++ text strings, and need them to be fast. For example, strings are tags for language localization, UI elements, variable names for binding to GPU shaders and CPU interpreted scripts, and snippets for runtime code generation in those languages.

Our open source SIMDString class is a drop-in replacement for std::string optimized for graphics applications, which tend to perform a lot of copy construction and concatenation on relatively small strings. SIMDString was originally created for x86 and x64 desktop in the G3D Innovation Engine and later generalized for ARM edge and mobile at NVIDIA and Roblox. Versions have been in production use for a decade now.

SIMDString embeds short strings in the class to reduce heap allocation, intentionally aliases immutable const-segment strings, uses a freelist optimized allocator, leverages SIMD instructions, and has several micro-optimizations for good performance on all platforms. It is >5x faster than std::string and outperforms the previous optimized strings released by Facebook and Electronic Arts on which it builds.

In this talk we introduce SIMDString with a detailed case study of its performance when working with GPU shaders. We review the various optimizations one could use for strings and show why the ones employed in SIMDString are a good combination for graphics–as well as why some seemingly good ideas are actually inefficient in this application. We conclude with benchmark results for multiple string classes showing the tradeoffs between implementations.
---

Zander Majercik

Zander Majercik is a Research Scientist at NVIDIA. His main research interest is novel rendering algorithms with a focus on ray-tracing. He has also worked on projects in vision science, novel display technologies for XR systems, and deep learning algorithms. He graduated from Williams College with a BA in Computer Science.
---

Morgan McGuire

Morgan McGuire is the Chief Scientist at Roblox, channeling and accelerating innovation across the company to build a creative, safe, civil, and scalable Metaverse. Roblox combines social interaction with a dynamic 3D environment and economy, with research impact spanning both technology and social sciences.

Morgan is also known for previous work as a professor, game developer, and industry researcher. As Director of Hyperscale Graphics Research at NVIDIA, Morgan accelerated cloud graphics, esports, ray tracing, and augmented and virtual reality. Morgan's product work includes the NVIDIA RTXGI and Reflex SDKs and RTX GPUs; the Skylanders®, Call of Duty®, Marvel Ultimate Alliance®, and Titan Quest® series of video games series; the Unity game engine; the E Ink display used in the Amazon Kindle®; and the PeakStream high performance computing architecture acquired by Google. Morgan's scientific publications and patents span many areas of computer science, from compilers and networking to 3D graphics.

Morgan cochaired the EuroGraphics Symposium on Rendering, the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, the ACM SIGGRAPH Symposium on Non-Photorealistic Animation and Rendering, and the ACM SIGGRAPH / EuroGraphics High Performance Rendering conference, and was the founding Editor-in-Chief of the Journal of Computer Graphics Techniques. Morgan is the author or coauthor of “the bible” of 3D, Computer Graphics: Principles & Practice 3rd Edition, The Graphics Codex, Creating Games, the G3D Innovation Engine, the Markdeep document system, and chapters of several GPU Gems, ShaderX, and GPU Pro volumes.

Morgan holds faculty positions at the University of Waterloo and McGill University and was a full professor at Williams College. Morgan received Ph.D. and M.S. degrees from Brown University and M.Eng. and B.S. degrees from MIT.

Videos Filmed & Edited by Bash Films: http://www.BashFilms.com
YouTube Channel Managed by Digital Medium Ltd https://events.digital-medium.co.uk

#cppcon.digital-medium.co.uk/tag/cppcon/">cppcon #cppcon.digital-medium.co.uk/tag/programming/">programming #cpp

Filed under: UncategorizedTagged with: , , , , , ,

The Most Important Optimizations to Apply in Your C++ Programs – Jan Bielak – CppCon 2022

https://cppcon.digital-medium.co.uk/tag/cppcon/">cppcon.org/
---

The Most Important Optimizations to Apply in Your C++ Programs - Jan Bielak - CppCon 2022
https://github.com/CppCon/CppCon2022

Writing efficient programs is hard. This is because it requires a lot of knowledge, experience and strategic thinking. There have been many talks on optimization and often each addresses a single concept. Being able to achieve a bird’s eye view of factors affecting performance often requires many hours of researching the topic. To lessen the mental burden of optimizing programs, I have picked out the techniques, I believe are most important. During the talk, I will present them in an organized manner and provide practical examples of how they can be applied.

I will first discuss what I believe are the main goals efficient programs strive to achieve. Then, I will present the general methods of achieving those goals. Then, for the majority of the talk, we will discuss a few dozen performance opportunities. For each of them, I will explain the underlying mechanism of how the optimisation works. I will avoid bluntly giving guidelines to follow without explanation. Each of the techniques naturally comes with its costs, and those will be discussed as well.

I will additionally discuss various performance pitfalls. These are sometimes called “premature pessimisations” in contrast to the often used term of “premature optimizations”. I will show examples of optimizations which do not incur any cost on program readability or maintainability and as such should be considered performance best practices. Avoiding their use doesn’t improve code in any manner, while making it slower.

This talk is intended for a diverse audience, as after all, probably most of the C++ community is interested in performance. It is appropriate for hobbyists and professionals alike, with varying experience with the language, due to the gradual increase in difficulty of examples. It will be a time productively spent.
---

Jan Bielak

Jan Bielak is a student at the Warsaw Staszic High School. His main areas of interest are physics and computer science. He is especially into advanced C++ cppcon.digital-medium.co.uk/tag/programming/">programming and physically based real-time rendering. He also hosts and educational YouTube channel. He is involved in the CyberDuck project and in the PaSh project. In free time, he likes to create renders in Blender.

Website: janbielak.com
GitHub: janekb04
YouTube: JBGraphics
---

Videos Streamed & Edited by Digital Medium: http://online.digital-medium.co.uk

#cppcon.digital-medium.co.uk/tag/cppcon/">cppcon #cppcon.digital-medium.co.uk/tag/programming/">programming #cpp

Filed under: UncategorizedTagged with: , , , , , , , ,