Jove Matrix Performance

Researcher

Benjamin Thomitzni, PhD Student in the Theoretical and Computational Chemistry Group of Prof. Dr. Andreas Dreuw

Initial problem

Matrix multiplication code has poor single core performance, also doesn't scale beyond 4 threads.

What we did

Change data layout and re-order loops to avoid cache misses
Use a specialized linear algebra library to generate optimized code
Add thread-safe and performant parallelism using OpenMP
See https://github.com/ssciwr/jove-performance for more details

Result

Code runs more than 4x faster on a single core
Near perfect scaling up to 12 cores on a 12-core/24-thread CPU

Copyright © 2023. All rights reserved