Files
composable_kernel/include/ck/tensor_operation/gpu/device/impl
Mingtao Gu 7998ae8969 [CK] Mxfp4 moe blockscale buf2lds version support (#2455)
* change cshuffle size

* added mxfp4 moe async buffer loading without B preshuffle

* added mx moe B shuffling + scale shuffling (async loads)

* minor fix

---------

Co-authored-by: mtgu0705 <mtgu@amd.com>
2025-07-06 15:42:00 +08:00
..
2024-05-10 09:41:39 -07:00
2023-06-19 09:44:22 -05:00