Chao Liu
8cba08d07a
Gemm+Reduce Fusion ( #128 )
...
* add gridwise gemm v4r1
* rename
* adding gemm+reduce
* adding gemm+reduce
* adding gemm+reduce
* adding gemm+reduce
* use sfc in shuffling
* remove hardcode
* remove hardcode
* refactor
* fix build
* adding gemm+reduce
* adding gemm+reduce
* adding gemm+reduce
* adding gemm+reduce
* adding gemm+reduce
* format
* clean
* adding gemm+reduce
* adding profiler for gemm+reduce
* adding gemm+reduce profiler
* fix build
* clean up
* gemm+reduce
* fix build
* update DeviceGemm_Xdl_CShuffle; update enum to enum class
* clean up
* add test for gemm+reduce
* clean up
* refactor
* fix build
* fix build
[ROCm/composable_kernel commit: f95267f166 ]
2022-03-23 22:18:42 -05:00
Jianfeng Yan
225c98244d
Batched gemm bf16 ( #142 )
...
* add bf16 for batched gemm
* batched_gemm_bf16 works
* recover accidently changed files
[ROCm/composable_kernel commit: d91f9f119c ]
2022-03-22 18:18:43 -05:00
Jianfeng Yan
aa4c28d53b
refactored deviceBatchedGemm; removed GridwiseBatchedGemm; added fp32 and int8 to profiler ( #120 )
...
changed long_index_t to index_t when computing memory offset
uncomment other ops in profiler
added test for batched_gemm
[ROCm/composable_kernel commit: cb87b049de ]
2022-03-21 16:45:14 -05:00
zjing14
4795d9803d
Batched GEMM for fp16 ( #79 )
...
* prepare host for batched_gemm
* init commit of batched kernels
* fixed
* refine transform with freeze
* m/n padding
* fixed a bug; clean
* add small tiles
* clean
* clean code
* clean code
* add nt, tn, tt layout
* add missing file
* use StaticBufferTupleOfVector instead
* add reference_batched_gemm
* fixed a macro
[ROCm/composable_kernel commit: b53e9d08ed ]
2022-02-11 09:36:52 -06:00