mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-01 20:21:23 +00:00
* Factor out the three separate copies of load_interleaved_pk_type into a common utility class * Add preprocessing with optional cache flushing and clearing of output for k_batch > 1 to the weight preshuffle GEMM example * Remove a duplicate function * Add support for B tensor type pk_int4_t for the weight preshuffle GEMM, with tests included * I4 support introduced more failing test cases that mirror the existing ones for F8 * Simplify the check for which tests to skip (they all have F8 as A tensor type) * Add a changelog entry * add the test for v2 wp pipeline, polish the code, add the support of int4 for v2 wp pipeline * have a workable version for atomic add * Revert "have a workable version for atomic add" This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb. --------- Co-authored-by: ThomasNing <thomas.ning@amd.com>