mscclpp/test/python/ext/ep/test_internode_multirank.py at 2391ce1de741fa920ddb43a222bf6b3c899d1736

mirror of https://github.com/microsoft/mscclpp.git synced 2026-05-13 01:36:10 +00:00

Files

Qinghua Zhou 2391ce1de7 ext/ep tests: add optional HT benchmark pass

Gated behind MSCCLPP_EP_BENCH=1 to keep correctness runs fast. Reports
per-iter latency (max across ranks, CUDA-event timed) and aggregate
effective bandwidth (sum across ranks, dispatch+combine payload bytes).
Tunable via MSCCLPP_EP_BENCH_WARMUP / _ITERS / _TOKENS / _HIDDEN.

Bench reuses the Buffer allocated for the correctness phase and
self-skips if the requested hidden exceeds the per-peer NVL/RDMA budget.

2026-04-22 19:03:09 +00:00

14 KiB

Raw Blame History

View Raw

14 KiB Raw Blame History

14 KiB

Raw Blame History