Qinghua Zhou
|
1071ddb050
|
Update the benchmark to improve the rank mapping, communicator creation, backend selection
|
2026-03-10 03:17:12 +00:00 |
|
Qinghua Zhou
|
d00713d3c2
|
Add more real moe workloads for alltoallv
|
2026-03-02 12:51:21 +00:00 |
|
Qinghua Zhou
|
ee843d445f
|
Add test of real MoE workloads
|
2026-02-25 12:39:48 +00:00 |
|
Qinghua Zhou
|
ae59eab6a2
|
Add unified benchmarking function to test all_to_all_single of mscclpp and torch
|
2026-02-24 07:17:17 +00:00 |
|
Qinghua Zhou
|
715ecd91cf
|
Add baseline test of torch.distributed.all_to_all_single
|
2026-02-24 06:51:10 +00:00 |
|
Qinghua Zhou
|
98be0def08
|
Use variable sizes in the peformance test
|
2026-02-24 06:29:46 +00:00 |
|
Qinghua Zhou
|
6292b6ab33
|
Report undirectional bandwidth
|
2026-02-24 06:02:33 +00:00 |
|
Qinghua Zhou
|
21e3f1ebb3
|
Get correct remote receive displacements for peers
|
2026-02-23 14:22:30 +00:00 |
|
Qinghua Zhou
|
7ba83e20dd
|
PyTorch-compatible all_to_all_single API using mscclpp kernels
|
2026-02-23 09:51:51 +00:00 |
|