Commit Graph

6 Commits

Author SHA1 Message Date
Anthony Chang
8bb6c6e120 use single threaded tensor generator (#161)
[ROCm/composable_kernel commit: f015c77687]
2022-03-30 22:28:30 -05:00
Jianfeng Yan
297ef9795d batched_gemm: use profiler in ctest (#163)
[ROCm/composable_kernel commit: c8f3acf9c0]
2022-03-30 21:32:49 -05:00
Jianfeng Yan
cb97ce68d8 Batched gemm and reduction (#156)
* adding batched_gemm_and_reduction

* batched_gemm_reduce works with bactch_count=1

* fix a bug in grid_size; batched_gemm_reduce works for batch_count > 1

* adding profiler for batched_gemm_fp16

* fixed a bug in declaration of d1 and d0; both example and profiler work

* clang-format

* cleanup

* batched_gemm_reduce: add test

* minor change

* fixed some typo in function names

[ROCm/composable_kernel commit: 34c661e71c]
2022-03-30 11:21:18 -05:00
Jianfeng Yan
0d02cb3dfe Batched gemm bf16 (#142)
* add bf16 for batched gemm

* batched_gemm_bf16 works

* recover accidently changed files

[ROCm/composable_kernel commit: d91f9f119c]
2022-03-22 18:18:43 -05:00
Jianfeng Yan
4ddc016c60 refactored deviceBatchedGemm; removed GridwiseBatchedGemm; added fp32 and int8 to profiler (#120)
changed long_index_t to index_t when computing memory offset

uncomment other ops in profiler

added test for batched_gemm

[ROCm/composable_kernel commit: cb87b049de]
2022-03-21 16:45:14 -05:00
zjing14
e57c9a886f Batched GEMM for fp16 (#79)
* prepare host for batched_gemm

* init commit of batched kernels

* fixed

* refine transform with freeze

* m/n padding

* fixed a bug; clean

* add small tiles

* clean

* clean code

* clean code

* add nt, tn, tt layout

* add missing file

* use StaticBufferTupleOfVector instead

* add reference_batched_gemm

* fixed a macro

[ROCm/composable_kernel commit: b53e9d08ed]
2022-02-11 09:36:52 -06:00