Add xdlops v4r4r4 into online compilation (#48)

* init for v4r4 xdlops olc

* refactor wrap

* init impl of v4r4 nchw xdlops olc

* tuning

* test perf

* fixed v4r4 nhwc

* tuned v4r4 nhwc

* use gridwise_gemm_xdlops_v2r3

* swap a/b

* add pointer support into offline v2r3

* debugging v4r4r4 transform for olc

* change timer of olc

* refactor v4r4 xdlops nchw olc

* remove transform fun in v4r4 xdlops nhwc olc

Co-authored-by: Chao Liu <chao.liu2@amd.com>
This commit is contained in:
zjing14
2021-07-16 23:27:08 -05:00
committed by GitHub
parent 0a72e4df94
commit fbdf4332c7
28 changed files with 1851 additions and 3909 deletions

View File

@@ -16,7 +16,7 @@
#include "device_dynamic_convolution_backward_data_implicit_gemm_v4r1r2_xdlops_nhwc_kyxc_nhwk.hpp"
#define USE_DYNAMIC_MODE 1
#define USE_CONV_BWD_V4R1_XDL_NHWC 0
#define USE_CONV_BWD_V4R1_XDL_NHWC 1
#define USE_CONV_BWD_V4R1R2_XDL_NHWC 1
enum ConvBackwardDataAlgo