[CK] Mxfp4 moe blockscale buf2lds version support (#2455)

* change cshuffle size

* added mxfp4 moe async buffer loading without B preshuffle

* added mx moe B shuffling + scale shuffling (async loads)

* minor fix

---------

Co-authored-by: mtgu0705 <mtgu@amd.com>
This commit is contained in:
Mingtao Gu
2025-07-06 15:42:00 +08:00
committed by GitHub
parent 3d70c638d1
commit 7998ae8969
19 changed files with 10677 additions and 3473 deletions

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff