Anthony Chang
1450193e62
Tune & add conflict-free LDS gemm kernels ( #159 )
...
* retune & add conflict-free bf16/fp16 c-shuffle gemm instances
amend wrong K1 value in some fp16/bf16 kernel instances
* make gemm cshuffle's timing behavior consistent with all other functions
* clang-format
* retune & add conflict-free fp32 c-shuffle gemm instances
* retune & add conflict-free int8 c-shuffle gemm instances
* update the underlying gridwise gemm of all c-shuffle gemm kernels
* typo
[ROCm/composable_kernel commit: 7db48f9008 ]
2022-03-31 12:58:41 -05:00
Chao Liu
3f732cceab
Compile for gfx908 and gfx90a ( #130 )
...
* adding compilation for multiple targets
* fix build
* clean
* update Jekinsfile
* update readme
* update Jenkins
* use ck::half_t instead of ushort for bf16
* rename enum classes
* clean
* rename
* clean
[ROCm/composable_kernel commit: cd167e492a ]
2022-03-31 12:33:34 -05:00
Jianfeng Yan
cb97ce68d8
Batched gemm and reduction ( #156 )
...
* adding batched_gemm_and_reduction
* batched_gemm_reduce works with bactch_count=1
* fix a bug in grid_size; batched_gemm_reduce works for batch_count > 1
* adding profiler for batched_gemm_fp16
* fixed a bug in declaration of d1 and d0; both example and profiler work
* clang-format
* cleanup
* batched_gemm_reduce: add test
* minor change
* fixed some typo in function names
[ROCm/composable_kernel commit: 34c661e71c ]
2022-03-30 11:21:18 -05:00
Chao Liu
d27a11cc78
Gemm+Reduce Fusion ( #128 )
...
* add gridwise gemm v4r1
* rename
* adding gemm+reduce
* adding gemm+reduce
* adding gemm+reduce
* adding gemm+reduce
* use sfc in shuffling
* remove hardcode
* remove hardcode
* refactor
* fix build
* adding gemm+reduce
* adding gemm+reduce
* adding gemm+reduce
* adding gemm+reduce
* adding gemm+reduce
* format
* clean
* adding gemm+reduce
* adding profiler for gemm+reduce
* adding gemm+reduce profiler
* fix build
* clean up
* gemm+reduce
* fix build
* update DeviceGemm_Xdl_CShuffle; update enum to enum class
* clean up
* add test for gemm+reduce
* clean up
* refactor
* fix build
* fix build
[ROCm/composable_kernel commit: f95267f166 ]
2022-03-23 22:18:42 -05:00
Chao Liu
82ad74304e
Reorganize files, Part 1 ( #119 )
...
* delete obselete files
* move files
* build
* update cmake
* update cmake
* fix build
* reorg examples
* update cmake for example and test
[ROCm/composable_kernel commit: 5d37d7bff4 ]
2022-03-08 21:46:36 -06:00