Files
mscclpp/include
Binyang Li 893a08e69c Enable MNNVL allreduce tuning
Add an MNNVL rank-domain override so MSCCL++ collectives can treat multi-host NVLink fabrics as a single CUDA IPC/NVLS peer group. Update packet, RSAG, and NVLS allreduce paths to use the collective domain size and teach the torch integration tuning example to select MNNVL-capable allreduce algorithms.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 05:38:59 +00:00
..
2026-04-28 05:38:59 +00:00
2026-01-21 20:32:24 -08:00