mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-04-20 06:49:15 +00:00
[GEMM] gemm_universal related optimization (#1453)
* replace buffer_atomic with global_atomic * fixed global_atomic_add * added bf16 atomic_add * format * clang-format-12 * clean * clean * add guards * Update gtest.cmake * enabled splitk_gemm_multi_d * format * add ckProfiler * format * fixed naming * format * clean * clean * add guards * fix clang format * format * add kbatch printout * clean * Add rocm6.2 related gemm optimization * Limit bf16 atomic usage * remove redundant RCR gemm_universal instance * Add RRR fp8 gemm universal instance * Bug fix * Add GPU_TARGET guard to FP8/BF8 target * bug fix * update cmake * remove all fp8/bf8 example if arch not support * Enable fp8 RRR support in ckProfiler * limit greedy-reverse flag to gemm_universal in ckProfiler --------- Co-authored-by: Jing Zhang <jizhan@fb.com> Co-authored-by: Jing Zhang <jizhan@meta.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin <Illia.Silin@amd.com>
This commit is contained in:
@@ -65,6 +65,12 @@ inline bool is_lds_direct_load_supported()
|
||||
ck::get_device_name() == "gfx941" || ck::get_device_name() == "gfx942";
|
||||
}
|
||||
|
||||
inline bool is_bf16_atomic_supported()
|
||||
{
|
||||
return ck::get_device_name() == "gfx940" || ck::get_device_name() == "gfx941" ||
|
||||
ck::get_device_name() == "gfx942";
|
||||
}
|
||||
|
||||
inline bool is_gfx101_supported()
|
||||
{
|
||||
return ck::get_device_name() == "gfx1010" || ck::get_device_name() == "gfx1011" ||
|
||||
|
||||
Reference in New Issue
Block a user