Files
cutlass/python/CuTeDSL
xiangg-nv becfce08cd Enable tcgen05 blockscaled ops on Thor SM110 (#3283)
Edge-LLM NvFP4 MoE CuTeDSL kernels on Thor use tcgen05 blockscaled MMA and SMEM-to-TMEM scale-factor copies. The existing checks only admitted the SM100/SM103 paths, so source-built CuTeDSL rejected SM110.

Admit Thor's blockscaled MMA arch aliases sm_101a and sm_110a, and allow the SM110f family for S2T tcgen05 copy ops.

Validation:

- git diff --check

- python3 -m py_compile python/CuTeDSL/cutlass/cute/nvgpu/tcgen05/mma.py python/CuTeDSL/cutlass/cute/nvgpu/tcgen05/copy.py

- DKG grouped_blockscaled_gemm.py documented 4-group example on Thor SM110: PASS

- Edge-LLM nvfp4_moe AOT for sm_110/aarch64: 12/12 variants PASS
2026-06-16 15:01:25 +08:00
..
2025-05-13 15:55:29 -04:00
2026-06-15 23:23:20 -04:00
2026-06-15 23:23:20 -04:00
2026-05-05 20:55:27 -04:00