[CK_TILE] Fix CShuffleEpilogue test to use correct GEMM
accumulator distribution (#4518)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
## Summary
The test was using LDS distribution to create the accumulator tile, but
CShuffleEpilogue expects the GEMM accumulator distribution that
BlockGemm produces. This mismatch caused incorrect data permutation.
## Changes
- Use WarpGemmDispatcher to get correct accumulator distribution
encoding
- Load test input from host-initialized global memory for deterministic
verification
- Shard tests by data type (FP16, FP8) with gfx950-specific FP8 tests
- Extract scale tests into separate target for better organization
- Implement exact permutation verification (all unique values appear
once)
- Reduce tile size from 256x256 to 128x128 to fit in unique fp16 range
- Add parameterized test configurations for various warp layouts and
MFMA types
## Test plan
- [x] Run new cshuffle epilogue tests
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Add cshuffle epilogue test
* add the poc implementation to the epilogue and tests
* refactor cshuffle epilogue
* WIP: adding tensor/tile usage to scale_tile
* fix usage of tile_elementwise_inout
* add gemm_quant_kernel for generalizing gemm quant kernel
* Add problem specific to different quants, add QuantType to Traits
* Add quant_type to quant_kernel template parameters
* Create aq/bq_block_windows and views depending on QuantType
* Use tile windows as inputs in cshuffle epilogue
* Fix some issues in epilogue
* initial new example code for new general gemm quant kernel test
* Fix issues in kernel
* Add verification check for rowcol Quantmode
* use AccDataType instead of AQ in pipeline
* fix aquant preshuffle
* fix formatting
* some cleanup
* remove gemm_aquant_basic.cpp
* remove gemm_aquant_kernel.hpp
* fix tests for the renamed quant kernel
* fix formatting
* clean example files
* fix some merge conflicts
* fix preshufflequant rename issue
* fix some templates after merging with develop
* fix test preshuffle parameter
* fix formatting
* Unify bquant kernel to the common quant kernel
* remove bquant kernel also from common header
* fix formatting
* clean up commented code
* fix formatting config hpp
* fix merge mistake
* Non-const for movable windows
* fix formatting
* Fix grammar in README
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
* Remove #include<bit> and clean up example
* fix strides
* Add some descriptions for move_windows
---------
Co-authored-by: Mohsen Saffari <mohsen.saffari@amd.com>
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>