mirror of
https://github.com/microsoft/mscclpp.git
synced 2026-05-25 15:24:43 +00:00
- Internode HT test: accept MSCCLPP_EP_HT_{TOKENS,HIDDEN,TOPK,EXPERTS}
env vars to override the functional-check problem size (was hardcoded
to num_tokens=128, hidden=1024, num_topk=min(4,num_ranks),
num_experts=num_ranks*4).
- Both intranode + internode HT tests: replace dist.all_to_all_single
bookkeeping (per-(src,dst) recv-count matrix used for the
six-metric NVL/RDMA BW breakdown) with dist.all_gather_into_tensor
+ transpose. Functionally identical (gathered[:, rank] gives the
same recv-from-src column) but works on socket-NCCL with
NCCL_IB_DISABLE=1, which is required on rigs where NCCL IB cannot
coexist with mscclpp RDMA. Sends num_ranks^2 int64 instead of
num_ranks per rank — negligible (64 ints at 8 ranks).
24 KiB
24 KiB