mirror of
https://github.com/amd/blis.git
synced 2026-07-03 05:37:51 +00:00
- In the existing code, blocksizes for sizes where M >> K, N >> K and K < 500 were not tuned properly for cases when application would use more than one instance of blis in parallel. - Imporved DGEMM performane for sizes where M, N >> k by retuning blocksizes. Such sizes are used by applications like HPL. AMD-Internal: [SWLCSG-3338] Change-Id: Iec17ecc53a6fabf50eedacaf208e4e74a4e21418