Files
Binyang Li dc0b8d75f3 GB200 support: SendRecv DSL collective and per-channel executor connections (#810)
## Summary
 
GB200 support work: introduces point-to-point send/receive in the
MSCCL++ DSL
and extends the executor for split-NVL-domain topologies where some
ranks are
NVL-connected within a node and other ranks must communicate across the
network.
 
 ### DSL
 - New `SendRecv` collective with separate input/output buffers
   (`python/mscclpp/language/collectives.py`).
 - New multi-node sendrecv DSL example
(`python/mscclpp/language/tests/multi_node/send_recv.py`) with
`--split_mask`
(group size − 1) and `--instances` CLI options. Documents the
channel-ordering
   trick that keeps signal tags cross-matched between paired peers when
   `prev == next`.
 - `BaseBuffer.__getitem__` now accepts slices with `None` start/stop
   (e.g., `buf[:]`).
 
 ### Executor
 - One connection (unique QP) per channel entry instead of one per peer.
Required for HostNoAtomic IB mode where each QP can forward signals to a
single semaphore. Uses per-peer tag counters so paired ranks agree on
tag
ordering regardless of the order peers appear in each rank's
`connected_to`
   list.
- MEMORY channels now unconditionally use `Transport::CudaIpc`; only
PORT
   channels can use IB. Matches the invariant already enforced by
   `getTransportFlags`.
- `ExecutionContext::connections` is now a `vector<Connection>` indexed
by
channel order (was `unordered_map<int, Connection>` keyed by peer).
Removes
   redundant semaphore fields from `ExecutionContext`.
 - TODO: explicit NVL-domain check in `useIB`

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Changho Hwang <changhohwang@microsoft.com>
2026-06-19 13:19:01 -07:00
..
2023-09-01 21:22:11 +08:00
2025-01-24 14:25:31 -08:00
2025-07-12 00:33:03 +00:00