mirror of
https://github.com/amd/blis.git
synced 2026-05-24 18:34:40 +00:00
- Introduced new 8x24 macro kernels.
- 4 new kernels are added for beta 0, beta 1, beta -1
and beta N.
- IR and JR loop moved to ASM region.
- Kernels support row major storage scheme.
- Prefetch of current micro panel of C is enabled.
- Kernel supports negative offsets for A and B matrices.
- Moved alpha scaling from DGEMM kernel to B pack kernel.
- Tuned blocksizes for new kernel.
- Added support for alpha scaling in 24xk pack kernel.
- Reverted back to old b_next computation
in gemm_ker_var2.
- BugFix in 8x24 DGEMM kernel for beta 1,
comparsion for jmp conditions was done using integer
instructions, which caused beta 1 path to never be taken.
Fixed this by changing the comparsion to double.
AMD-Internal: [CPUPL-5262]
Change-Id: Ieec207eea2a164603c8a8ea88e0b1d3095c29a3f