ltqin
0d55b15355
NHWC conv 2d: fwd bfp16/int8, Device level tuning and host API ( #73 )
...
* add fwd bf16 conv
* change tunning parametor
* add int8 for conv fwd
* remove comments
* change tunning parametor for int8
* change init int8 example
* add test for conv2d fwd
* change device operation file pos because merge develop
* fwd int8 use reference
* test_conv_fwd use reference
* add braket for if statement
* rename fwd example name
* remove StaticBufferOfVectorTypeV2
* tweak example
Co-authored-by: ltqin <letaoqin@amd.com >
Co-authored-by: Chao Liu <chao.liu2@amd.com >
[ROCm/composable_kernel commit: 880fbee957 ]
2022-02-11 20:06:40 -06:00
zjing14
53c79a56c6
Add small tile size for fp16/fp32 and NN layout ( #80 )
...
* add DeviceGemmSplitKXdl
* add file device_gemm_splitk_xdl.hpp
* set c matrix zero
* using atomic
* add all tuning parameter to f32 mkkn
* grid size change to 720
* add tunning parameter for NT
* add tunning parameter for TN
* add tunning parameter for TT
* add m=96tunning parameter
* add lost config
* debug
* fix sweep
* add failed tuning params
* fixed sweep logic
* clean
* add padding to M/N for irr tile size
* clean code
* add element wise operation
* fixed MPerBlock=96
* remove marco for slpitk swtich
* add test
* add new line at the end of device_gemm_xdl_instance.hpp
* remove step hack
* seperate split-k instance files
* add tunning parameters
* change disired grid size to parameters
* remove slice length
* add desiredgridsize parameter to ckProfiler
* add losting file device_gemm_xdl_splitk_instance.hpp
* change desired gride size to kbatch
* format
* format
* clean up
* add selection of device_instances
* clean code
* clean code
* add small tile size in fp16 nn
* test for rocm 4.5
* merge develop
* clean
* clean
* clean
* remove no-use code
* add padding switch to device_gemm_xdl
* add padding switch for ksplit fp32
* clean
* clean
* add files
* rename
* Update profiler.cpp
* format
Co-authored-by: ltqin <letaoqin@amd.com >
Co-authored-by: ltqin <letao.qin@amd.com >
Co-authored-by: Chao Liu <chao.liu2@amd.com >
[ROCm/composable_kernel commit: 20a672d0b8 ]
2022-02-11 15:49:06 -06:00
zjing14
4795d9803d
Batched GEMM for fp16 ( #79 )
...
* prepare host for batched_gemm
* init commit of batched kernels
* fixed
* refine transform with freeze
* m/n padding
* fixed a bug; clean
* add small tiles
* clean
* clean code
* clean code
* add nt, tn, tt layout
* add missing file
* use StaticBufferTupleOfVector instead
* add reference_batched_gemm
* fixed a macro
[ROCm/composable_kernel commit: b53e9d08ed ]
2022-02-11 09:36:52 -06:00
Chao Liu
fb387c0e82
GEMM+Bias+ReLU+Add ( #76 )
...
* tweak conv for odd C
* update script
* clean up elementwise op
* fix build
* clean up
* added example for gemm+bias+relu+add
* added example for gemm+bias+relu
* add profiler for gemm_s_shuffle; re-org files
* add profiler
* fix build
* clean up
* clean up
* clean up
* fix build
[ROCm/composable_kernel commit: 823657ed12 ]
2022-02-06 22:32:47 -06:00