composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-18 03:49:41 +00:00

Author	SHA1	Message	Date
ltqin	998217be22	add split-k GEMM (#59 ) * add DeviceGemmSplitKXdl * add file device_gemm_splitk_xdl.hpp * set c matrix zero * using atomic * add all tuning parameter to f32 mkkn * grid size change to 720 * add tunning parameter for NT * add tunning parameter for TN * add tunning parameter for TT * add m=96tunning parameter * add lost config * add element wise operation * fixed MPerBlock=96 * remove marco for slpitk swtich * add test * add new line at the end of device_gemm_xdl_instance.hpp * remove step hack * seperate split-k instance files * add tunning parameters * change disired grid size to parameters * remove slice length * add desiredgridsize parameter to ckProfiler * add losting file device_gemm_xdl_splitk_instance.hpp * change desired gride size to kbatch * format * format * clean up * add selection of device_instances * clean code * fix build issue Co-authored-by: ltqin <letaoqin@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com> Co-authored-by: Jing Zhang <jizhan@amd.com> [ROCm/composable_kernel commit: `4be7f0198e`]	2022-02-02 22:47:27 -06:00
Chao Liu	886680ae94	Fusion Conv+Bias+ReLU(+Add) (#62 ) * fix relu * clean up * clean up * adding 1x1 conv * adding 1x1 conv * added 1x1 conv * refactor * refactor * refactor * added profiler for conv+bias+relu+add * clean up * adding conv+bias+relu * adding conv+bias+relu * added conv+bias+relu * Update README.md * update cpu verification * adding c shuffle * update static_tensor for dealing with invalid element * adding c shuffle * debugging * fix bug * convert to fp16 before shuffle * shuffle more than one M/NRepeat * clean up * remove coordinate step hack from GridwiseGemm_k0mk1_k0nk1_mn_xdlops_v3r1 * clean up * remove coordinate step hack from all gridwise gemm xdl * clean up coordinate step hack * clean up coordinate step hack * ThreadwiseTensorSliceTransfer_v3r2 support pointwise op on both src and dst * adding output shuffle in conv+bias+relu+add * update * added conv+bias+relu+add with c shuffle * added conv+bias+relu+add with c shuffle * fix forward_sweep bugs in threadwise copy * clean up * refactor * clean up * clean up * added conv_c_shuffle+bias_relu * clean up * added conv+bias+relu+atomic_add * clean up * clean up * clean up * clean up * clean up * clean up * misc fixes; add 1x1 specialization * clean up * delete unused device op * clean up * add support for odd C value [ROCm/composable_kernel commit: `acbd7bd7c5`]	2021-12-26 07:43:42 -07:00
Chao Liu	4a141b5a04	GEMM/Conv+BiasAdd+ReLU+Add (#55 ) * gemm+activation * move C pointwise operation into threadwise copy * add pointwise operation to A/B matrix * update ckProfiler * adding bias add * adding bias add * adding bias add * added bias add; worked around compiler issues * clean up * clean up * Update README.md * Update README.md * Update README.md * clean up * add conv_xdl example * adding conv_xdl_bias_relu_add example * add conv+bias+relu+add, but has register spill issue * tweak * tweak * refactor * Update README.md update readme for example/2_gemm_xdl_bias_relu_add * clean up * Update README.md update readme for example/3_conv_xdl * Update README.md [ROCm/composable_kernel commit: `41cdd3801a`]	2021-12-02 20:07:37 -06:00
Chao Liu	46ec6b3ef0	fix layout naming convention (#56 ) [ROCm/composable_kernel commit: `4041850f11`]	2021-11-30 09:10:55 -06:00
zjing14	220bc28498	add args for packed gemm (#54 ) [ROCm/composable_kernel commit: `567f5e9c5f`]	2021-11-24 12:33:55 -06:00
zjing14	1e7102575b	fixed multiple definition issue of bfp16/fp32 conversion function when building ckProfiler (#51 ) * fixed bfloat16 issues * refactor type_convert Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `0a66c54e95`]	2021-11-16 15:44:17 -06:00
Chao Liu	8791d26e52	FP16 data in-register transpose (#41 ) * start fixing 16bit data packing * adding StaticTensor * adding StaticTensor * adding StaticTensor * add missing constexpr * adding static tensor * adding static tensor * adding transpose * add inline asm for transpose 2x2 of half_t * add general transpose_vectors(), but have unnecessary register initialization using v_mov * fix unnecessary register initialization in transpose_vector by using more pass-by-reference * add hardcoded logic for NHWC wrw * improve asm for v_pack * make ThreadwiseTensorSliceTransfer_v3r2 support any tensor * tweak * reorganize file [ROCm/composable_kernel commit: `b491ebf384`]	2021-11-15 10:05:58 -06:00
Chao Liu	2f5ccb68f5	ckProfiler and device-level XDL GEMM operator (#48 ) * add DeviceGemmXdl * update script * fix naming issue * fix comment * output HostTensorDescriptor * rename * padded GEMM for fwd v4r4r4 nhwc * refactor * refactor * refactor * adding ckProfiler * adding ckProfiler * refactor * fix tuning parameter bug * add more gemm instances * add more fp16 GEMM instances * fix profiler driver * fix bug in tuning parameter * add fp32 gemm instances * small fix * refactor * rename * refactor gemm profiler; adding DeviceConv and conv profiler * refactor * fix * add conv profiler * refactor * adding more GEMM and Conv instance * Create README.md Add build instruction for ckProfiler * Create README.md Add Readme for gemm_xdl example * Update README.md Remove build instruction from top most folder * Update README.md * clean up [ROCm/composable_kernel commit: `e823d518cb`]	2021-11-14 11:28:32 -06:00

8 Commits