Support transposed C tile in Aquant (#2679)

The performance of Aquant has increased after enabling transposed C.

Do not need to exchange AQ elements among lanes after enabling
transposed C as one thread only holds data from one row.
This commit is contained in:
Cong Ma
2025-08-28 14:28:09 -06:00
committed by GitHub
parent 0758883fa4
commit 428090f749
10 changed files with 276 additions and 154 deletions

View File

@@ -90,6 +90,7 @@ float gemm_calc_aquant(const ck_tile::AQuantGemmHostArgs& args, const ck_tile::s
CodegenGemmShape,
CodegenGemmTraits,
QuantGroupSize,
transposed_warp_gemm,
ComputeDataType,
ck_tile::GemmPipelineScheduler::Intrawave,
has_hot_loop_v,