Jove Matrix Performance
Researcher
- Benjamin Thomitzni, PhD Student in the Theoretical and Computational Chemistry Group of Prof. Dr. Andreas Dreuw
Initial problem
- Matrix multiplication code has poor single core performance, also doesn't scale beyond 4 threads.
What we did
- Change data layout and re-order loops to avoid cache misses
- Use a specialized linear algebra library to generate optimized code
- Add thread-safe and performant parallelism using OpenMP
- See https://github.com/ssciwr/jove-performance for more details
Result
- Code runs more than 4x faster on a single core
- Near perfect scaling up to 12 cores on a 12-core/24-thread CPU