mirror of
https://github.com/microsoft/mscclpp.git
synced 2026-05-24 14:54:51 +00:00
Adds MSCCLPP_EP_NUM_SMS / MSCCLPP_EP_NVL_SEND / MSCCLPP_EP_NVL_RECV env overrides for ep.Config(num_sms, num_max_nvl_chunked_send_tokens, num_max_nvl_chunked_recv_tokens). Defaults unchanged (20, 8, 256). Sweep on 4-rank intranode HT (tokens=4096, hidden=7168, experts=256): sms=20, NVL_SEND=8, NVL_RECV=256 -> d_recv=50.76, c_recv=65.66 GB/s sms=64, NVL_SEND=16, NVL_RECV=512 -> d_recv=57.75, c_recv=150.46 GB/s d_recv (actual NVL throughput per rank) plateaus at ~57 GB/s for topk>=2; combine recv scales near-linearly with num_sms.