Mx fp6 flatmm (#3601)

* add fp6 data-type and support sync/async dwordx3 load/store

* clang-format

* pre-commit

* 1st commit

* default mnk pass ut

* fix a distrubution

* fix

* fix bdram distr

* update

* pass ut

* improve perf

* update

* clean code

* resolve copilot comment

* reslove comment

* clang-format

---------

Co-authored-by: ZheWang <zhewan@amd.com>
This commit is contained in:
ZheWang
2026-02-02 16:04:40 +08:00
committed by GitHub
parent 1ae83137eb
commit e6bcd192d4
21 changed files with 761 additions and 136 deletions

View File

@@ -1614,7 +1614,8 @@ struct WarpGemmAttributeMfmaImpl_f32_16x16x128_f8f6f4
return make_tuple(number<0>{}, int32x8_t{});
else if constexpr(std::is_same_v<decltype(dtype), bf8_t>)
return make_tuple(number<1>{}, int32x8_t{});
// else if e2m3 => make_tuple(number<2>{}, int32x6_t{})
else if constexpr(std::is_same_v<decltype(dtype), pk_fp6x16_t>)
return make_tuple(number<2>{}, pk_fp6x32_t{});
// else if e3m2 => make_tuple(number<3>{}, int32x6_t{})
else if constexpr(std::is_same_v<decltype(dtype), pk_fp4_t>)
return make_tuple(number<4>{}, int32x4_t{});