zjing14
8a43beac2e
Split k f16 ( #97 )
...
* init for splitk f16
* a working prototype
* debug
* perf debug
* update example
* instances for mk kn
* add instances for all layers
* clean
* clean
* add tuning
* format
* add mn_padding into irregular tile
* clean
Co-authored-by: Chao Liu <chao.liu2@amd.com >
[ROCm/composable_kernel commit: e221d11e51 ]
2022-02-25 01:19:37 -06:00
Chao Liu
66e4b458c4
Add gridwise GEMM pipeline ( #89 )
...
* clean up
* add mutilple thread scratch to ThreadwiseTensorSliceTransfer_v3r1
* add 2 stage prefetch
* add more sanity check into transform_tensor_descriptor
* tweak
* enabling 2 stage prefetch to exsiting gridwise gemm; tweak
* enabling 2 stage prefetch to exsiting gridwise gemm
* move gridwise gemm pipeline in class; clean up
* add some irregular tile size
* update CalculateHasMainK0BlockLoop for multi-stage-prefetch
* refactor gridwise gemm pipeline class
[ROCm/composable_kernel commit: 22d438ae9e ]
2022-02-23 17:23:49 -06:00
rocking5566
34ecda63ec
Gemm alpha beta profiler (fp32 & fp16) ( #91 )
...
* [What] Refactor verification of gemm alpha_beta, move to reference operation
[Why] Sync with other verification
* Profile mk_nk for gemm bias 2d
* Support bias 2d with mn * kn in profiler
* Support bias 2d with km*kn and km*nk in profiler
* Support fp32 bias 2d in profiler
* format
* format
Co-authored-by: rocking <chunylai@amd.com >
Co-authored-by: Chao Liu <chao.liu2@amd.com >
[ROCm/composable_kernel commit: 19c5d6e651 ]
2022-02-21 11:35:21 -06:00
ltqin
32c128bcc5
NHWC conv 2d: fwd bfp16/int8, Device level tuning and host API ( #73 )
...
* add fwd bf16 conv
* change tunning parametor
* add int8 for conv fwd
* remove comments
* change tunning parametor for int8
* change init int8 example
* add test for conv2d fwd
* change device operation file pos because merge develop
* fwd int8 use reference
* test_conv_fwd use reference
* add braket for if statement
* rename fwd example name
* remove StaticBufferOfVectorTypeV2
* tweak example
Co-authored-by: ltqin <letaoqin@amd.com >
Co-authored-by: Chao Liu <chao.liu2@amd.com >
[ROCm/composable_kernel commit: 880fbee957 ]
2022-02-11 20:06:40 -06:00
zjing14
e57c9a886f
Batched GEMM for fp16 ( #79 )
...
* prepare host for batched_gemm
* init commit of batched kernels
* fixed
* refine transform with freeze
* m/n padding
* fixed a bug; clean
* add small tiles
* clean
* clean code
* clean code
* add nt, tn, tt layout
* add missing file
* use StaticBufferTupleOfVector instead
* add reference_batched_gemm
* fixed a macro
[ROCm/composable_kernel commit: b53e9d08ed ]
2022-02-11 09:36:52 -06:00
Chao Liu
8efcb80fa5
GEMM+Bias+ReLU+Add ( #76 )
...
* tweak conv for odd C
* update script
* clean up elementwise op
* fix build
* clean up
* added example for gemm+bias+relu+add
* added example for gemm+bias+relu
* add profiler for gemm_s_shuffle; re-org files
* add profiler
* fix build
* clean up
* clean up
* clean up
* fix build
[ROCm/composable_kernel commit: 823657ed12 ]
2022-02-06 22:32:47 -06:00
ltqin
25d05d36c4
add split-k GEMM ( #59 )
...
* add DeviceGemmSplitKXdl
* add file device_gemm_splitk_xdl.hpp
* set c matrix zero
* using atomic
* add all tuning parameter to f32 mkkn
* grid size change to 720
* add tunning parameter for NT
* add tunning parameter for TN
* add tunning parameter for TT
* add m=96tunning parameter
* add lost config
* add element wise operation
* fixed MPerBlock=96
* remove marco for slpitk swtich
* add test
* add new line at the end of device_gemm_xdl_instance.hpp
* remove step hack
* seperate split-k instance files
* add tunning parameters
* change disired grid size to parameters
* remove slice length
* add desiredgridsize parameter to ckProfiler
* add losting file device_gemm_xdl_splitk_instance.hpp
* change desired gride size to kbatch
* format
* format
* clean up
* add selection of device_instances
* clean code
* fix build issue
Co-authored-by: ltqin <letaoqin@amd.com >
Co-authored-by: Chao Liu <chao.liu2@amd.com >
Co-authored-by: Jing Zhang <jizhan@amd.com >
[ROCm/composable_kernel commit: 4be7f0198e ]
2022-02-02 22:47:27 -06:00
Chao Liu
d6a0d8efcd
Fusion Conv+Bias+ReLU(+Add) ( #62 )
...
* fix relu
* clean up
* clean up
* adding 1x1 conv
* adding 1x1 conv
* added 1x1 conv
* refactor
* refactor
* refactor
* added profiler for conv+bias+relu+add
* clean up
* adding conv+bias+relu
* adding conv+bias+relu
* added conv+bias+relu
* Update README.md
* update cpu verification
* adding c shuffle
* update static_tensor for dealing with invalid element
* adding c shuffle
* debugging
* fix bug
* convert to fp16 before shuffle
* shuffle more than one M/NRepeat
* clean up
* remove coordinate step hack from GridwiseGemm_k0mk1_k0nk1_mn_xdlops_v3r1
* clean up
* remove coordinate step hack from all gridwise gemm xdl
* clean up coordinate step hack
* clean up coordinate step hack
* ThreadwiseTensorSliceTransfer_v3r2 support pointwise op on both src and dst
* adding output shuffle in conv+bias+relu+add
* update
* added conv+bias+relu+add with c shuffle
* added conv+bias+relu+add with c shuffle
* fix forward_sweep bugs in threadwise copy
* clean up
* refactor
* clean up
* clean up
* added conv_c_shuffle+bias_relu
* clean up
* added conv+bias+relu+atomic_add
* clean up
* clean up
* clean up
* clean up
* clean up
* clean up
* misc fixes; add 1x1 specialization
* clean up
* delete unused device op
* clean up
* add support for odd C value
[ROCm/composable_kernel commit: acbd7bd7c5 ]
2021-12-26 07:43:42 -07:00
Chao Liu
9c85245412
GEMM/Conv+BiasAdd+ReLU+Add ( #55 )
...
* gemm+activation
* move C pointwise operation into threadwise copy
* add pointwise operation to A/B matrix
* update ckProfiler
* adding bias add
* adding bias add
* adding bias add
* added bias add; worked around compiler issues
* clean up
* clean up
* Update README.md
* Update README.md
* Update README.md
* clean up
* add conv_xdl example
* adding conv_xdl_bias_relu_add example
* add conv+bias+relu+add, but has register spill issue
* tweak
* tweak
* refactor
* Update README.md
update readme for example/2_gemm_xdl_bias_relu_add
* clean up
* Update README.md
update readme for example/3_conv_xdl
* Update README.md
[ROCm/composable_kernel commit: 41cdd3801a ]
2021-12-02 20:07:37 -06:00
Chao Liu
727098dfa0
fix layout naming convention ( #56 )
...
[ROCm/composable_kernel commit: 4041850f11 ]
2021-11-30 09:10:55 -06:00
zjing14
35a48e1bc6
add args for packed gemm ( #54 )
...
[ROCm/composable_kernel commit: 567f5e9c5f ]
2021-11-24 12:33:55 -06:00
zjing14
43b1d325d4
fixed multiple definition issue of bfp16/fp32 conversion function when building ckProfiler ( #51 )
...
* fixed bfloat16 issues
* refactor type_convert
Co-authored-by: Chao Liu <chao.liu2@amd.com >
[ROCm/composable_kernel commit: 0a66c54e95 ]
2021-11-16 15:44:17 -06:00
Chao Liu
b827099a27
FP16 data in-register transpose ( #41 )
...
* start fixing 16bit data packing
* adding StaticTensor
* adding StaticTensor
* adding StaticTensor
* add missing constexpr
* adding static tensor
* adding static tensor
* adding transpose
* add inline asm for transpose 2x2 of half_t
* add general transpose_vectors(), but have unnecessary register initialization using v_mov
* fix unnecessary register initialization in transpose_vector by using more pass-by-reference
* add hardcoded logic for NHWC wrw
* improve asm for v_pack
* make ThreadwiseTensorSliceTransfer_v3r2 support any tensor
* tweak
* reorganize file
[ROCm/composable_kernel commit: b491ebf384 ]
2021-11-15 10:05:58 -06:00
Chao Liu
b9f9ed96ac
ckProfiler and device-level XDL GEMM operator ( #48 )
...
* add DeviceGemmXdl
* update script
* fix naming issue
* fix comment
* output HostTensorDescriptor
* rename
* padded GEMM for fwd v4r4r4 nhwc
* refactor
* refactor
* refactor
* adding ckProfiler
* adding ckProfiler
* refactor
* fix tuning parameter bug
* add more gemm instances
* add more fp16 GEMM instances
* fix profiler driver
* fix bug in tuning parameter
* add fp32 gemm instances
* small fix
* refactor
* rename
* refactor gemm profiler; adding DeviceConv and conv profiler
* refactor
* fix
* add conv profiler
* refactor
* adding more GEMM and Conv instance
* Create README.md
Add build instruction for ckProfiler
* Create README.md
Add Readme for gemm_xdl example
* Update README.md
Remove build instruction from top most folder
* Update README.md
* clean up
[ROCm/composable_kernel commit: e823d518cb ]
2021-11-14 11:28:32 -06:00