blis/kernels at d6fcfe734517a1a53fb0fa38d9a650841c9f09b0 - blis

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-12 01:59:59 +00:00

Files

Harsh Dave 590c763e22 Implemented ctrsm small kernels

Details:
-- AMD Internal Id: CPUPL-1702
-- Used 8x3 CGEMM kernel with vector fma by utilizing ymm registers
   efficiently to produce 24 scomplex outputs at a time
-- Used packing of matrix A to effectively cache and reuse
-- Implemented kernels using macro based modular approach
-- Added ctrsm_small for in ctrsm_ BLAS path for single thread
   when (m,n)<1000 and multithread (m+n)<320
-- Taken care of --disable_pre_inversion configuration
-- Achieved 13% average performance improvement for sizes less than 1000
-- modularized all 16 combinations of trsm into 4 kernels

Change-Id: I557c5bcd8cb7c034acd99ce0666bc411e9c4fe64

2021-11-12 08:58:55 +05:30

armsve

New kernel set for Arm SVE using assembly (#396 )

2020-05-21 11:56:45 +05:30

armv7a

Squash-merge 'pr' into 'squash'. (#457 )

2020-11-14 09:39:48 -06:00

armv8a

avoid loading twice in armv8a gemm kernel (#403 )

2020-05-21 12:37:53 +05:30

bgq

Replaced use of bool_t type with C99 bool.