* add tensorwise quant in grouped gemm
* fix example issue
* update test cases
* format codes
* clang format
* use GTEST_FAIL
* fix a bug in test_grouped_gemm_util
* skip test when use wmma on grouped_quant kernel
* change cmake
* change code based on comments
---------
Co-authored-by: ThomasNing <thomas.ning@amd.com>
* debugging
* debugging for prefill shapes
* comment unused code
* fix for prefill shapes
* clearing up the code
* add int4 to universal gemm example
* clang formatted
* adding test for prefill shapes in block scale gemm
* lil improv on the block pipeline
* Address Review Comment
---------
Co-authored-by: ThomasNing <thomas.ning@amd.com>
* [CK TILE GEMM] Support Aquant GEMM with transposeC and preshuffle
When TransposeC and QuantPreshuffle are both true, Aquant generates
correct result.
* [CK TILE GEMM] Support Aquant GEMM with transposeC and preshuffle
- Add unit tests
* Fix bug in is_quantpreshuffle_enabled
* clang format
---------
Co-authored-by: ThomasNing <thomas.ning@amd.com>
* initial commit
* remove extra files
* fixing errors
* updated ReadMe file for mapping of diff quants with diff configs
* addressing review comments
* addressing review comments
* Resolved merge conflicts
* [CK TILE GEMM] Replace get_preshuffle_or with is_quantpreshuffle_enabled
The get_preshuffle_or was not working as expected, which led to incorrect behavior
in the quantization preshuffle process. This change replaces it with the more reliable
is_quantpreshuffle_enabled function to properly determine when preshuffle should be applied.
* initial commit
* debugging
* working fp8 for init constant
* fp8 working with all inits
* updated block level code with comments
* changing the loop iter
* debugging
* debugging
* debugging
* code fix
* code clean up
* clang formatted
* Add comment
* code cleanup
* clang formatted
* merge conflicts fixes
* applying the latest int4 changes to the piepline
* fixing test code for updated traits
* Adding gtest
* review comments addressed
* addressing review comments
* remove c++20 code
* added flush cache changes
---------
Co-authored-by: Cong Ma <congma13@amd.com>
Co-authored-by: root <root@banff-cyxtera-s73-2.ctr.dcgpu>
* Remove C++20 code
C++20 features should not be used in CK. Remove all C++20 code.
* fix c++17 build
* format
* fix merge issue
---------
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com>
* Have a workable version for SGPR
* have a workable version for atomic add
* Revert "have a workable version for atomic add"
This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb.
* substitute with the new sgpr read api
* update the CHANGELOG
* have a workable version for atomic add
* Revert "have a workable version for atomic add"
This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb.
* change to static for logic
* have a workable version for atomic add
* Revert "have a workable version for atomic add"
This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb.
* rename gemm_group_quant to gemm_quant
* Add TensorWise quant mode
* Cshuffle epilogue tests with tensor scaling
* Add tensor quant to example
* Don't use readfirstlane for reading scales - doesn't work for some reason
* Add to changelog
* revert include - from a merge problem?
* revert common.hpp include
* revert host.hpp include
* remove unused utility function
* rename quant pipeline problem
* refactor quant tests
* remove aquant utils
* use TEST_F
* fix all tests by changing gemm config
* Use typed tests
* fix copyright