mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-19 20:40:07 +00:00
Add Gemm instances for performance improvement (#1018)
* improve kpad
* more tuning parameters
* f16_f8_fp16
* cut test time
* add f16_f8_fp16
* add f16_f8_f16
* testing instances for skinny cases
* format
* clean
* add fp16_f8_fp16
* clang-format
* add grouped gemm instalces
* fixed profile grouped_gemm
* clean
* clean
* clean
* clean
* clean
* add missing instance func
* fixed inferface
---------
Co-authored-by: Jing Zhang <jizha@amd.com>
Co-authored-by: root <root@sh5-1e707-rc06-38.mkm.dcgpu>
[ROCm/composable_kernel commit: 98fd41f597]
This commit is contained in:
@@ -143,8 +143,7 @@ bool profile_gemm_splitk_impl(int do_verification,
|
||||
// profile device GEMM instances
|
||||
for(auto& op_ptr : op_ptrs)
|
||||
{
|
||||
std::vector<int> kbatch_list = {1, 2, 4, 8, 12, 16, 20, 24, 32, 36, 40, 60,
|
||||
64, 72, 80, 88, 96, 128, 144, 160, 176, 192, 256};
|
||||
std::vector<int> kbatch_list = {1, 2, 4, 8, 12, 16, 20, 32, 36, 40, 64, 96, 128};
|
||||
|
||||
if(KBatch > 0)
|
||||
{
|
||||
|
||||
Reference in New Issue
Block a user