Files
mscclpp/include
Binyang Li 982b7ae230 Add SwitchGroupSemaphore for O(1) multicast signal/wait
Implement signal/wait synchronization for the switch (multicast) group
using multimem.red hardware reduction on the NVSwitch, matching the
approach used by NCCL.

- Signal: multimem.red.release.sys.global.add.u32 on the multicast flag
  address atomically increments the flag on all peers via a single
  multicast reduction operation.
- Wait: polls the local device flag pointer until all devices have
  signaled (flag reaches numDevices * signalCount).

New types:
- SwitchGroupSemaphoreDeviceHandle: device-side handle with signal(),
  relaxedSignal(), wait(), and relaxedWait() methods.
- SwitchGroupSemaphore: host-side class that manages the flag channel
  and expected inbound counter.

Also adds a GroupSignalWait test that verifies all-rank GPU-side
barrier synchronization using the new semaphore.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-07 18:31:08 +00:00
..
2026-01-21 20:32:24 -08:00