Commit Graph

13 Commits

Author SHA1 Message Date
Qinghua Zhou
ec011f14ea Add detection of torch.baseline and debug info 2026-03-25 01:52:24 +00:00
Qinghua Zhou
7e1cb7b8cf Support cross-node CudaIPC 2026-03-21 10:41:32 +00:00
Qinghua Zhou
9ef1fb7cee Run pass the multinode test 2026-03-18 17:08:22 +00:00
Qinghua Zhou
bdb30b56a5 Broadcast UniqueId via TCP; Detect whether torch comparison is possible 2026-03-16 10:01:35 +00:00
Qinghua Zhou
f47e97659d Update the benchmark to improve the rank mapping, communicator creation, backend selection 2026-03-16 09:25:34 +00:00
Qinghua Zhou
d00713d3c2 Add more real moe workloads for alltoallv 2026-03-02 12:51:21 +00:00
Qinghua Zhou
ee843d445f Add test of real MoE workloads 2026-02-25 12:39:48 +00:00
Qinghua Zhou
ae59eab6a2 Add unified benchmarking function to test all_to_all_single of mscclpp and torch 2026-02-24 07:17:17 +00:00
Qinghua Zhou
715ecd91cf Add baseline test of torch.distributed.all_to_all_single 2026-02-24 06:51:10 +00:00
Qinghua Zhou
98be0def08 Use variable sizes in the peformance test 2026-02-24 06:29:46 +00:00
Qinghua Zhou
6292b6ab33 Report undirectional bandwidth 2026-02-24 06:02:33 +00:00
Qinghua Zhou
21e3f1ebb3 Get correct remote receive displacements for peers 2026-02-23 14:22:30 +00:00
Qinghua Zhou
7ba83e20dd PyTorch-compatible all_to_all_single API using mscclpp kernels 2026-02-23 09:51:51 +00:00