Commit Graph

4 Commits

Author SHA1 Message Date
Chao Liu
8cba08d07a Gemm+Reduce Fusion (#128)
* add gridwise gemm v4r1

* rename

* adding gemm+reduce

* adding gemm+reduce

* adding gemm+reduce

* adding gemm+reduce

* use sfc in shuffling

* remove hardcode

* remove hardcode

* refactor

* fix build

* adding gemm+reduce

* adding gemm+reduce

* adding gemm+reduce

* adding gemm+reduce

* adding gemm+reduce

* format

* clean

* adding gemm+reduce

* adding profiler for gemm+reduce

* adding gemm+reduce profiler

* fix build

* clean up

* gemm+reduce

* fix build

* update DeviceGemm_Xdl_CShuffle; update enum to enum class

* clean up

* add test for gemm+reduce

* clean up

* refactor

* fix build

* fix build

[ROCm/composable_kernel commit: f95267f166]
2022-03-23 22:18:42 -05:00
Jianfeng Yan
225c98244d Batched gemm bf16 (#142)
* add bf16 for batched gemm

* batched_gemm_bf16 works

* recover accidently changed files

[ROCm/composable_kernel commit: d91f9f119c]
2022-03-22 18:18:43 -05:00
Jianfeng Yan
aa4c28d53b refactored deviceBatchedGemm; removed GridwiseBatchedGemm; added fp32 and int8 to profiler (#120)
changed long_index_t to index_t when computing memory offset

uncomment other ops in profiler

added test for batched_gemm

[ROCm/composable_kernel commit: cb87b049de]
2022-03-21 16:45:14 -05:00
zjing14
4795d9803d Batched GEMM for fp16 (#79)
* prepare host for batched_gemm

* init commit of batched kernels

* fixed

* refine transform with freeze

* m/n padding

* fixed a bug; clean

* add small tiles

* clean

* clean code

* clean code

* add nt, tn, tt layout

* add missing file

* use StaticBufferTupleOfVector instead

* add reference_batched_gemm

* fixed a macro

[ROCm/composable_kernel commit: b53e9d08ed]
2022-02-11 09:36:52 -06:00