Commit Graph

16 Commits

Author SHA1 Message Date
Qinghua Zhou
520c890df5 Add debug variable MSCCLPP_DEBUG_ALLTOALLV_to print 2026-04-02 04:39:48 +00:00
Qinghua Zhou
36940dbacf Match the message size for EP bench HT of 16 GPUs in test 6 2026-03-30 03:40:05 +00:00
Qinghua Zhou
62ab8883a6 Update multinode mode selection logic for IB and NVSwitch; Add tests of EP equivalent workloads 2026-03-30 01:34:53 +00:00
Qinghua Zhou
ec011f14ea Add detection of torch.baseline and debug info 2026-03-25 01:52:24 +00:00
Qinghua Zhou
7e1cb7b8cf Support cross-node CudaIPC 2026-03-21 10:41:32 +00:00
Qinghua Zhou
9ef1fb7cee Run pass the multinode test 2026-03-18 17:08:22 +00:00
Qinghua Zhou
bdb30b56a5 Broadcast UniqueId via TCP; Detect whether torch comparison is possible 2026-03-16 10:01:35 +00:00
Qinghua Zhou
f47e97659d Update the benchmark to improve the rank mapping, communicator creation, backend selection 2026-03-16 09:25:34 +00:00
Qinghua Zhou
d00713d3c2 Add more real moe workloads for alltoallv 2026-03-02 12:51:21 +00:00
Qinghua Zhou
ee843d445f Add test of real MoE workloads 2026-02-25 12:39:48 +00:00
Qinghua Zhou
ae59eab6a2 Add unified benchmarking function to test all_to_all_single of mscclpp and torch 2026-02-24 07:17:17 +00:00
Qinghua Zhou
715ecd91cf Add baseline test of torch.distributed.all_to_all_single 2026-02-24 06:51:10 +00:00
Qinghua Zhou
98be0def08 Use variable sizes in the peformance test 2026-02-24 06:29:46 +00:00
Qinghua Zhou
6292b6ab33 Report undirectional bandwidth 2026-02-24 06:02:33 +00:00
Qinghua Zhou
21e3f1ebb3 Get correct remote receive displacements for peers 2026-02-23 14:22:30 +00:00
Qinghua Zhou
7ba83e20dd PyTorch-compatible all_to_all_single API using mscclpp kernels 2026-02-23 09:51:51 +00:00