blis/frame at a3836a560da687dbf86c2a0b7e00a1b5e26a24e6 - blis

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-12 10:05:38 +00:00

Files

mkadavil a3836a560d Smart Threading for GEMM (sgemm) v1.

- Cache aware factorization.
Experiments shows that ic,jc factorization based on m,n gives better
results compared to mu,nu on a generic data set in SUP path. Also
slight adjustments in the factorizations w.r.t matrix data loads can
help in improving perf further.

- Moving native path inputs to SUP path.
Experiments shows that in multi-threaded scenarios if the per thread
data falls under SUP thresholds, taking SUP path instead of native path
results in improved performance. This is the case even if the original
matrix dimensions falls in native path. This is not applicable if A
matrix transpose is required.

- Enabling B matrix packing in SUP path.
Performance improvement is observed when B matrix is packed in cases
where gemm takes SUP path instead of native path based on per thread
matrix dimensions.

AMD-Internal: [CPUPL-659]

Change-Id: I3b8fc238a0ece1ababe5d64aebab63092f7c6914

2022-05-17 18:10:39 +05:30