zjing14 4795d9803d Batched GEMM for fp16 (#79)
* prepare host for batched_gemm

* init commit of batched kernels

* fixed

* refine transform with freeze

* m/n padding

* fixed a bug; clean

* add small tiles

* clean

* clean code

* clean code

* add nt, tn, tt layout

* add missing file

* use StaticBufferTupleOfVector instead

* add reference_batched_gemm

* fixed a macro

[ROCm/composable_kernel commit: b53e9d08ed]
2022-02-11 09:36:52 -06:00
2021-08-08 17:41:54 +00:00
2022-02-10 23:52:19 -06:00
2022-02-11 09:36:52 -06:00
2022-02-06 22:32:47 -06:00
2022-02-02 22:47:27 -06:00
2018-10-08 22:49:58 -05:00
2021-08-08 17:41:54 +00:00
2022-02-06 22:32:47 -06:00
Description
[DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror
Readme MIT 234 MiB
Languages
C++ 93.1%
Python 4.5%
CMake 1.5%
Shell 0.5%
Pawn 0.2%