cutlass

mirror of https://github.com/NVIDIA/cutlass.git synced 2026-05-11 17:00:05 +00:00

Files

Ali Hassani d1ef0e87f2 DistGEMM bug fixes (#2713 )

* Blackwell DistGEMM bug fixes

1. If using preferred cluster, there needs to be a branch so that
   the universal GEMM wrapper finds the correct base params.
2. Workspace sizes can change depending on problem shape in Blackwell,
   and DistGEMM was previously using the per-device shape to evaluate
   workspace size instead of the per-gemm shape.
3. Flattened size used to initialize host tensors can overflow (in
   Hopper example as well)
4. Preferred and fallback cluster args need to be set explicitly,
   otherwise if someone modifies the example to use preferred cluster,
   it will just fail.

* Fix example runtimes

* Set default fallback cluster shapes to the static ones

2025-11-06 13:31:24 -05:00

cute

v4.3 update. (#2709 )

2025-10-21 14:26:30 -04:00

cutlass

DistGEMM bug fixes (#2713 )

2025-11-06 13:31:24 -05:00