mirror of
https://github.com/microsoft/mscclpp.git
synced 2026-05-11 17:00:22 +00:00
Add an MNNVL rank-domain override so MSCCL++ collectives can treat multi-host NVLink fabrics as a single CUDA IPC/NVLS peer group. Update packet, RSAG, and NVLS allreduce paths to use the collective domain size and teach the torch integration tuning example to select MNNVL-capable allreduce algorithms. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>