mirror of
https://github.com/amd/blis.git
synced 2026-07-02 13:17:16 +00:00
- Updated the conversion function(in case of receiving column stored inputs) from BF16 to F32, in order to use the correct strides while storing. - Conversion of B is potentially multithreaded using the threads meant for IC compute. With the wrong strides in the kernel, this gives rise to incorrect writes onto the miscellaneous buffer. AMD-Internal: [CPUPL-7675] Co-authored-by: Vishal-A <Vishal.Akula@amd.com> Co-authored-by: Vignesh Balasubramanian <vignbala@amd.com>