Files
blis/kernels
Vignesh Balasubramanian 2ad25a7180 ZGEMM kernel performance improvement for k=1 sizes:
The current implementation for handling zgemm exploits SIMD parallelism
along the k dimension. This would give great performance in cases of k
being large. But for input sizes with k=1, it is better to exploit SIMD
parallelism along the m and n dimensions, thereby giving better
performance. This commit does the same through loop reordering, by
loading column vectors from A.

AMD-Internal: [CPUPL-2236]
Change-Id: Ibfa29f271395497b6e2d0127c319ecb4b883d19f
2022-06-30 07:19:52 -04:00
..
2021-11-12 08:58:52 +05:30
2020-09-29 16:52:18 -05:00
2021-04-27 11:09:48 +05:30
2020-07-22 18:24:26 +05:30
2022-06-13 10:52:53 +05:30