Jianfeng Yan 87522d2ac9 Batched gemm and reduction (#156)
* adding batched_gemm_and_reduction

* batched_gemm_reduce works with bactch_count=1

* fix a bug in grid_size; batched_gemm_reduce works for batch_count > 1

* adding profiler for batched_gemm_fp16

* fixed a bug in declaration of d1 and d0; both example and profiler work

* clang-format

* cleanup

* batched_gemm_reduce: add test

* minor change

* fixed some typo in function names

[ROCm/composable_kernel commit: 34c661e71c]
2022-03-30 11:21:18 -05:00
2022-02-18 21:44:11 -06:00
2022-03-30 11:21:18 -05:00
2022-03-30 11:21:18 -05:00
2022-03-30 11:21:18 -05:00
2018-10-08 22:49:58 -05:00
2021-08-08 17:41:54 +00:00
2022-03-08 21:46:36 -06:00
Description
[DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror
Readme MIT 234 MiB
Languages
C++ 93.1%
Python 4.5%
CMake 1.5%
Shell 0.5%
Pawn 0.2%