Refer to color mm algorithm in Multithreading.md.

2026-04-20 07:38:53 +00:00 · 2018-12-04 13:30:25 -06:00
parent 22384fd2b7
commit 9b688a2d69
1 changed files with 1 additions and 1 deletions
--- a/docs/Multithreading.md
+++ b/docs/Multithreading.md
@@ -104,7 +104,7 @@ Next, which combinations of loops to parallelize depends on which caches are sha
 * For compute resources that have private L2 caches but that share an L3 cache (example: cores on a socket), try parallelizing the `IC` loop. In this situation, threads will share the same packed row panel from matrix B, but pack and compute with different blocks of matrix A.
 * If compute resources share an L2 cache but have private L1 caches (example: pairs of cores), try parallelizing the `JR` loop. Here, threads share the same packed block of matrix A but read different packed micro-panels of B into their private L1 caches. In some situations, parallelizing the `IR` loop may also be effective.

-![The primary algorithm for level-3 operations in BLIS](http://www.cs.utexas.edu/users/field/mm_algorithm.png)
+![The primary algorithm for level-3 operations in BLIS](http://www.cs.utexas.edu/users/field/mm_algorithm_color.png)

 ## Globally at runtime