* Add int8 of mk_nk_mn to the ckProfiler
* Add example of int8 gemm
* Fix typo, use ushort instead of half_t for bfloat16
* replace ushortXXX_t to bhalfXXX_t
* rename ushort to bhalf_t
* Add bf16 example
* Add bf16 gemm to ckProfiler
* Fix alignment
* Fix typo
* Add unit test for gemm_xdl int8
* Add gemm_xdl fp32 unit test
* Add gemm_xdl bf16 unit test
* fix build
* fix build issue due to merge conflict
* Fix build
* Fix build error
Co-authored-by: rocking <chunylai@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
* start conv2d bwd api
* kernel running
* add bwd reference
* change to no shuffle
* fix bwd reference
* pass verification
* add Filter1x1Stride1Pad0 and start testing
* change some tuning parameter
* fix test error
* add fp16 tuning parameter
* add bf16 tuning parameter
* add int8 tuning parameters
* change fp32 tuning parameter
* add bwd to profiler
* fix bug for bwd profiler
* fix ckProfiler bug
* change conv2d_bwd_xdl to fp16
* fix bug in comments
* fix precompile id
* fix enum conv name
* chage _bwd_ to _bwd_data_
* change conv2d_bwd example id
* bwd to bwd data
* fix prehead
* fix MakeDefaultBlock2CTileMap ,import form merge develop
* format bwd instance
* bwd to bwd data
* change name bwd to bwd data
* change name bwd to bwd data in example
* formate code
* change conv2d bwd data id in example
* rewrite readme for example
* fix CalculateMagicNumbers about div zero
* add workaround CK_WORKAROUND_SWDEV_325164
* change test_conf2d_bwd_data show info
* format
* fix bug for workaround:CK_WORKAROUND_SWDEV_325164
* formate tuning parameters
* formate tuning parameters again
* formate tuning parameters 3
* formate tuning parameters 4
* remove add function template
* format
* update comment
Co-authored-by: ltqin <letaoqin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
* init for splitk f16
* a working prototype
* debug
* perf debug
* update example
* instances for mk kn
* add instances for all layers
* clean
* clean
* add tuning
* format
* add mn_padding into irregular tile
* clean
Co-authored-by: Chao Liu <chao.liu2@amd.com>
* [What] Refactor verification of gemm alpha_beta, move to reference operation
[Why] Sync with other verification
* Profile mk_nk for gemm bias 2d
* Support bias 2d with mn * kn in profiler
* Support bias 2d with km*kn and km*nk in profiler
* Support fp32 bias 2d in profiler
* format
* format
Co-authored-by: rocking <chunylai@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
* tweak conv for odd C
* update script
* clean up elementwise op
* fix build
* clean up
* added example for gemm+bias+relu+add
* added example for gemm+bias+relu
* add profiler for gemm_s_shuffle; re-org files
* add profiler
* fix build
* clean up
* clean up
* clean up
* fix build
* fix relu
* clean up
* clean up
* adding 1x1 conv
* adding 1x1 conv
* added 1x1 conv
* refactor
* refactor
* refactor
* added profiler for conv+bias+relu+add
* clean up
* adding conv+bias+relu
* adding conv+bias+relu
* added conv+bias+relu
* Update README.md
* update cpu verification
* adding c shuffle
* update static_tensor for dealing with invalid element
* adding c shuffle
* debugging
* fix bug
* convert to fp16 before shuffle
* shuffle more than one M/NRepeat
* clean up
* remove coordinate step hack from GridwiseGemm_k0mk1_k0nk1_mn_xdlops_v3r1
* clean up
* remove coordinate step hack from all gridwise gemm xdl
* clean up coordinate step hack
* clean up coordinate step hack
* ThreadwiseTensorSliceTransfer_v3r2 support pointwise op on both src and dst
* adding output shuffle in conv+bias+relu+add
* update
* added conv+bias+relu+add with c shuffle
* added conv+bias+relu+add with c shuffle
* fix forward_sweep bugs in threadwise copy
* clean up
* refactor
* clean up
* clean up
* added conv_c_shuffle+bias_relu
* clean up
* added conv+bias+relu+atomic_add
* clean up
* clean up
* clean up
* clean up
* clean up
* clean up
* misc fixes; add 1x1 specialization
* clean up
* delete unused device op
* clean up
* add support for odd C value