mirror of
https://github.com/amd/blis.git
synced 2026-04-19 23:28:52 +00:00
Fixed undefined behavior trsm ukr bug in bdd46f9.
Details:
- Fixed a bug that mainfested anytime a configuration was used in which
optimized microkernels were registered and the trsm operation (or
kernel) was invoked. The bug resulted from the optimized microkernels'
register blocksizes conflicting with the hard-coded values--expressed
in the form of constant loop bounds--used in the new reference trsm
ukernels that were introduced in bdd46f9. The fix was easy: reverting
back to the implementation that uses variable-bound loops, which
amounted to changing an #if 0 to #if 1 (since I preserved the older
implementation in the file alongside the new code based on constant-
bound loops). It should be noted that this fix must be permanent,
since the trsm kernel code with constant-bound loops can never work
with gemm ukernels that use different register blocksizes.
This commit is contained in:
@@ -34,7 +34,7 @@
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
#if 1
|
||||
#if 0
|
||||
|
||||
// An implementation that attempts to facilitate emission of vectorized
|
||||
// instructions via constant loop bounds + #pragma omp simd directives.
|
||||
|
||||
Reference in New Issue
Block a user