cutlass

mirror of https://github.com/NVIDIA/cutlass.git synced 2026-06-29 02:47:05 +00:00

Files

xiangg-nv becfce08cd Enable tcgen05 blockscaled ops on Thor SM110 (#3283 )

Edge-LLM NvFP4 MoE CuTeDSL kernels on Thor use tcgen05 blockscaled MMA and SMEM-to-TMEM scale-factor copies. The existing checks only admitted the SM100/SM103 paths, so source-built CuTeDSL rejected SM110.

Admit Thor's blockscaled MMA arch aliases sm_101a and sm_110a, and allow the SM110f family for S2T tcgen05 copy ops.

Validation:

- git diff --check

- python3 -m py_compile python/CuTeDSL/cutlass/cute/nvgpu/tcgen05/mma.py python/CuTeDSL/cutlass/cute/nvgpu/tcgen05/copy.py

- DKG grouped_blockscaled_gemm.py documented 4-group example on Thor SM110: PASS

- Edge-LLM nvfp4_moe AOT for sm_110/aarch64: 12/12 variants PASS

2026-06-16 15:01:25 +08:00

cutlass

Enable tcgen05 blockscaled ops on Thor SM110 (#3283 )

2026-06-16 15:01:25 +08:00

EULA.txt

Release v4.0.0 (#2294 )

2025-05-13 15:55:29 -04:00

prep_editable_install.py

v4.6 dev update. (#3315 )

2026-06-15 23:23:20 -04:00

pyproject.toml

v4.4 tag release update. (#3032 )

2026-02-13 23:27:58 -05:00

requirements-cu13.txt

v4.6 dev update. (#3315 )

2026-06-15 23:23:20 -04:00

requirements.txt

v4.6 dev update. (#3315 )

2026-06-15 23:23:20 -04:00

setup.sh

v4.5 tag update (#3202 )

2026-05-05 20:55:27 -04:00