Jove Matrix Performance

Researcher

Initial problem

  • Matrix multiplication code has poor single core performance, also doesn't scale beyond 4 threads.

What we did

Result

  • Code runs more than 4x faster on a single core
  • Near perfect scaling up to 12 cores on a 12-core/24-thread CPU

Single core speed up over baseline performanceParallel speed up vs single core baseline