mscclpp

mirror of https://github.com/microsoft/mscclpp.git synced 2026-07-17 09:17:25 +00:00

Files

Qinghua Zhou 2391ce1de7 ext/ep tests: add optional HT benchmark pass

Gated behind MSCCLPP_EP_BENCH=1 to keep correctness runs fast. Reports
per-iter latency (max across ranks, CUDA-event timed) and aggregate
effective bandwidth (sum across ranks, dispatch+combine payload bytes).
Tunable via MSCCLPP_EP_BENCH_WARMUP / _ITERS / _TOKENS / _HIDDEN.

Bench reuses the Buffer allocated for the correctness phase and
self-skips if the requested hidden exceeds the per-peer NVL/RDMA budget.

2026-04-22 19:03:09 +00:00

test_ep_smoke.py

src/ext/ep: port low-latency dispatch/combine kernels

2026-04-20 21:46:00 +00:00

test_internode_multirank.py

ext/ep tests: add optional HT benchmark pass

2026-04-22 19:03:09 +00:00

test_intranode_multirank.py

ext/ep tests: add optional HT benchmark pass

2026-04-22 19:03:09 +00:00

test_low_latency_multirank.py

ext/ep: unfilter LL sync + add LL multirank test (intra-node WIP)

2026-04-22 06:11:30 +00:00