mirror of
https://github.com/amd/blis.git
synced 2026-04-20 07:38:53 +00:00
Refer to color mm algorithm in Multithreading.md.
This commit is contained in:
@@ -104,7 +104,7 @@ Next, which combinations of loops to parallelize depends on which caches are sha
|
||||
* For compute resources that have private L2 caches but that share an L3 cache (example: cores on a socket), try parallelizing the `IC` loop. In this situation, threads will share the same packed row panel from matrix B, but pack and compute with different blocks of matrix A.
|
||||
* If compute resources share an L2 cache but have private L1 caches (example: pairs of cores), try parallelizing the `JR` loop. Here, threads share the same packed block of matrix A but read different packed micro-panels of B into their private L1 caches. In some situations, parallelizing the `IR` loop may also be effective.
|
||||
|
||||

|
||||

|
||||
|
||||
## Globally at runtime
|
||||
|
||||
|
||||
Reference in New Issue
Block a user