mirror of
https://github.com/amd/blis.git
synced 2026-05-13 18:52:14 +00:00
Details: - This implementation does a transpose operation while packing 16xk of A buffer and passes it to 16x3-nn kernel. - The same implementation works for the case where B has transpose. AMD-Internal: [CPUPL-1376] Change-Id: I81f74deb609926598f62c30f5bd6fc80fb1b9a17