Jianfeng Yan
c2e3fa5c91
Conv3d new ( #94 )
...
* conv3d compiles but has memory error
* conv3d works
* fix performance issue by using __builtin_amdgc_readfirstlane
* change MakeBlock2CTileMap to MakeDefaultBlock2CTileMap; change c_blockid_to* to cblockid_to*
* clang-format
* remove CK_EXPERIMENTAL_PASS_TENSOR_DECRIPTOR_BY_*; moved wrapper into DeviceConv3d
* format
* remove useless marc
* add comment
Co-authored-by: Chao Liu <chao.liu2@amd.com >
[ROCm/composable_kernel commit: 6dfb92bbef ]
2022-02-22 22:45:28 -06:00
ltqin
0d55b15355
NHWC conv 2d: fwd bfp16/int8, Device level tuning and host API ( #73 )
...
* add fwd bf16 conv
* change tunning parametor
* add int8 for conv fwd
* remove comments
* change tunning parametor for int8
* change init int8 example
* add test for conv2d fwd
* change device operation file pos because merge develop
* fwd int8 use reference
* test_conv_fwd use reference
* add braket for if statement
* rename fwd example name
* remove StaticBufferOfVectorTypeV2
* tweak example
Co-authored-by: ltqin <letaoqin@amd.com >
Co-authored-by: Chao Liu <chao.liu2@amd.com >
[ROCm/composable_kernel commit: 880fbee957 ]
2022-02-11 20:06:40 -06:00
ltqin
998217be22
add split-k GEMM ( #59 )
...
* add DeviceGemmSplitKXdl
* add file device_gemm_splitk_xdl.hpp
* set c matrix zero
* using atomic
* add all tuning parameter to f32 mkkn
* grid size change to 720
* add tunning parameter for NT
* add tunning parameter for TN
* add tunning parameter for TT
* add m=96tunning parameter
* add lost config
* add element wise operation
* fixed MPerBlock=96
* remove marco for slpitk swtich
* add test
* add new line at the end of device_gemm_xdl_instance.hpp
* remove step hack
* seperate split-k instance files
* add tunning parameters
* change disired grid size to parameters
* remove slice length
* add desiredgridsize parameter to ckProfiler
* add losting file device_gemm_xdl_splitk_instance.hpp
* change desired gride size to kbatch
* format
* format
* clean up
* add selection of device_instances
* clean code
* fix build issue
Co-authored-by: ltqin <letaoqin@amd.com >
Co-authored-by: Chao Liu <chao.liu2@amd.com >
Co-authored-by: Jing Zhang <jizhan@amd.com >
[ROCm/composable_kernel commit: 4be7f0198e ]
2022-02-02 22:47:27 -06:00
Chao Liu
7f14c82cd7
added test for magic number division ( #58 )
...
[ROCm/composable_kernel commit: 237d4ca03f ]
2021-11-30 09:09:28 -06:00