Ck moe mxfp4 blockm32 (#3098)

* block_m = 32

* ck block_m = 32

* aiter/3rdparty/composable_kernel/include/ck/tensor_operation/gpu/block/blockwise_gemm_pipeline_xdlops_b_preshuffle_mx_moe_v3.hpp format

* mxfp4_moe v1 pipe

* update format

---------

Co-authored-by: zhimding <zhimding@amd.com>
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com>
Co-authored-by: felix <felix.li@amd.com>
This commit is contained in:
Xudong Yuan
2025-11-07 08:45:41 +08:00
committed by GitHub
parent 5f3cae3e28
commit d04eba4ae3
7 changed files with 1357 additions and 277 deletions

View File

@@ -122,7 +122,7 @@ struct DeviceMoeGemmMXBPreShuffle : public DeviceMoEGemmMXBPreShuffle<ALayout,
MPerXDL,
NPerXDL,
MXdlPerWave,
NXdlPerWave_,
math::max(2, NXdlPerWave_),
ABlockTransferThreadClusterLengths_AK0_M_AK1,
ABlockTransferThreadClusterArrangeOrder,
ABlockTransferSrcAccessOrder,