Fixed BLAS test failures of small matrix SYRK for single and double precision.

Details:
- SYRK for small matrix was implemented by reusing small GEMM routine. This was
  resulting in output written to the full C matrix, and C being symmetric the
  lower and upper triangles of C matrix contained same results. BLAS SYRK API
  spec demands either lower or upper triangle of C matrix to be written with
  results. So, this was resulting in BLAS test failures, even though testsuite
  of BLIS was passing small SYRK operation.
- To fix BLAS test failures of small matrix SYRK, separate kernel routines are
  implemented for small SYRK for both single and double precision. The newly
  added small SYRK routines are in file kernels/zen/3/bli_syrk_small.c.
  Now the intermediate results of matrix C are written to a scratch buffer.
  Final results are written from scratch buffer to matrix C using SIMD
  copy to either lower or upper traingle part of matrix C.
- Source and header files frame/3/syrk/bli_syrk_front.c and
  frame/3/syrk/bli_syrk_front.h are changed to invoke new small SYRK routines.

Change-Id: I9cfb1116c93d150aefac673fca033952ecac97cb
This commit is contained in:
sraut
2018-12-19 21:21:10 +05:30
committed by Devrajegowda, Kiran
parent 2752b51c37
commit d6bb56d088
2 changed files with 2 additions and 3 deletions

View File

@@ -157,7 +157,6 @@ void bli_cntx_init_zen( cntx_t* cntx )
bli_blksz_init_easy( &blkszs[ BLIS_NC ], 4080, 4080, 4080, 4080 );
#else
bli_blksz_init_easy( &blkszs[ BLIS_MC ], 144, 240, 144, 72 );
bli_blksz_init_easy( &blkszs[ BLIS_KC ], 256, 512, 256, 256 );
bli_blksz_init_easy( &blkszs[ BLIS_NC ], 4080, 2040, 4080, 4080 );

View File

@@ -5,6 +5,7 @@
libraries.
Copyright (C) 2014, The University of Texas at Austin
Copyright (C) 2018, Advanced Micro Devices, Inc.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
@@ -42,7 +43,6 @@ void bli_syrk_front
rntm_t* rntm,
cntl_t* cntl
);
err_t bli_syrk_small
(
obj_t* alpha,
@@ -52,4 +52,4 @@ err_t bli_syrk_small
obj_t* c,
cntx_t* cntx,
cntl_t* cntl
);
);