mirror of
https://github.com/NVIDIA/cutlass.git
synced 2026-06-29 02:47:05 +00:00
Edge-LLM NvFP4 MoE CuTeDSL kernels on Thor use tcgen05 blockscaled MMA and SMEM-to-TMEM scale-factor copies. The existing checks only admitted the SM100/SM103 paths, so source-built CuTeDSL rejected SM110. Admit Thor's blockscaled MMA arch aliases sm_101a and sm_110a, and allow the SM110f family for S2T tcgen05 copy ops. Validation: - git diff --check - python3 -m py_compile python/CuTeDSL/cutlass/cute/nvgpu/tcgen05/mma.py python/CuTeDSL/cutlass/cute/nvgpu/tcgen05/copy.py - DKG grouped_blockscaled_gemm.py documented 4-group example on Thor SM110: PASS - Edge-LLM nvfp4_moe AOT for sm_110/aarch64: 12/12 variants PASS