mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 02:02:46 +00:00
* add tensorwise quant in grouped gemm
* fix example issue
* update test cases
* format codes
* clang format
* use GTEST_FAIL
* add bquant to grouped_gemm
* add tensorwise quant in grouped gemm
* fix example issue
* update test cases
* format codes
* clang format
* use GTEST_FAIL
* fix a bug in test_grouped_gemm_util
* skip test when use wmma on grouped_quant kernel
* change cmake
* fix a bug in test_grouped_gemm_util
* skip test when use wmma on grouped_quant kernel
* change cmake
* tests(quant_grouped_gemm): add unit tests to cover bquant in grouped_gemm
* Update test/ck_tile/grouped_gemm_quant/test_grouped_gemm_util_quant.hpp
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update example/ck_tile/17_grouped_gemm/quant_grouped_gemm.hpp
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat: add bf8 support
* chore: remove unnecessary decltype usage
* chore: add default quant_mode to function signature as fallback
* fix: pass correct runtime pipeline params in grouped_gemm bquant kernel
Calculate has_hot_loop, num_loop, and tail_number on device side for each
GEMM problem instead of using default values. This fixes incorrect results
when different problems in the group have different K dimensions.
* chore: set default quant mode in function signature
* test: add additional test cases to cover edge case of no hotloop
* change code based on comments
* WIP: bquant preshuffle b compiles but gives numerical error
* feat(grouped_gemm_quant): bquant with preshuffleB support added to grouped_gemm example & kernel
* refactor: refactor code after merge commit
* chore: remove print statements
* test(grouped_gemm): split test cases by quant mode to reduce compilation time and add bquant-preshuffleB mode test cases
---------
Co-authored-by: kyle-256 <Kyle.Zhao@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
[ROCm/composable_kernel commit: 8f1274d9b6]