Commit Graph

90 Commits

Author SHA1 Message Date
Changho Hwang
b6ea0ca266 IB unit test (#47) 2023-04-07 21:45:14 +08:00
Saeed Maleki
2c6460ce72 bug fix for allgather0 2023-04-03 04:36:20 +00:00
Saeed Maleki
5ff64d36f4 documents for allgather2 + refactoring local allgather 2023-04-02 03:36:22 +00:00
Saeed Maleki
0887cfe768 no need for remapping anymore 2023-04-02 02:35:08 +00:00
Saeed Maleki
701255959e lint 2023-03-31 23:34:43 +00:00
Saeed Maleki
97dadd8d64 merged with main 2023-03-31 23:32:01 +00:00
Saeed Maleki
44e8760af7 allgather kernel2 2023-03-31 06:31:25 +00:00
Saeed Maleki
fef0bff945 a third kernel for allgather cross-node 2023-03-30 23:24:04 +00:00
Saeed Maleki
29254439e5 Merge pull request #38 from microsoft/saemal/removing_gdrcopy
removing gdrcopy and adding flush functionality
2023-03-30 13:17:47 -07:00
Saeed Maleki
debd110874 fused flush instructions 2023-03-29 22:17:02 +00:00
Saeed Maleki
d97bee6973 flush mechanism 2023-03-29 17:31:20 +00:00
Bin Wang
7880be8ee2 Fix the 2 GiB limit in allgather test. (#36) 2023-03-29 19:02:43 +08:00
Saeed Maleki
43c52367fb merged with main and simplified the callback requirements 2023-03-27 23:41:27 +00:00
Saeed Maleki
19bf369dc1 link format correction 2023-03-27 20:40:15 +00:00
Saeed Maleki
0898214f0a added mscclppGetErrorString 2023-03-24 22:57:14 +00:00
Saeed Maleki
56b599b5e7 a bit of api change and clean up on docs 2023-03-24 17:41:04 +00:00
Changho Hwang
274e921009 Minor fixes 2023-03-24 07:28:30 +00:00
Saeed Maleki
c042112b6b perf debug for allgather 2023-03-24 06:49:38 +00:00
Saeed Maleki
777e93ee47 merged with main 2023-03-24 02:35:15 +00:00
Saeed Maleki
56d86472e6 done with allgather_test commenting 2023-03-24 00:22:47 +00:00
Saeed Maleki
58595b1410 Merge pull request #23 from microsoft/saemal/apipush
cleaner allgather_test
2023-03-23 16:17:55 -07:00
Saeed Maleki
86f79ef442 cleaner allgather_test 2023-03-23 23:13:00 +00:00
Madan Musuvathi
e6ee81e4fa fixed the order of remote rank and tag in mscclppConnect API 2023-03-23 21:09:04 +00:00
Madan Musuvathi
72edabe2a6 added GetDevConn api to retrieve a connection from remoteRank and tag 2023-03-23 21:03:30 +00:00
Madan Musuvathi
896539b236 Comm owns all state including devcons 2023-03-22 22:43:32 +00:00
Saeed Maleki
e1cd88ca0b bootstrap test with and without uniq_id 2023-03-22 21:11:27 +00:00
Madan Musuvathi
4c459aa0df allgather_test code cleanup 2023-03-22 20:38:29 +00:00
Madan Musuvathi
261fd7f838 allgather_test code cleanup 2023-03-22 18:50:23 +00:00
Madan Musuvathi
44c6b94747 api version 1 2023-03-22 18:28:30 +00:00
Madan Musuvathi
6ea460bb3a fusing signal with sync 2023-03-22 18:16:42 +00:00
Changho Hwang
9f2eef35d3 Init from a given mscclppUniqueId 2023-03-22 11:25:49 +00:00
Changho Hwang
61511473cb Fix bootstrap_test w/o MPI 2023-03-22 10:03:30 +00:00
Saeed Maleki
483b0c8433 flag is now allocated by the system 2023-03-22 05:14:24 +00:00
Saeed Maleki
0a707d84ec new api works -- single node is not performant 2023-03-22 02:19:49 +00:00
Saeed Maleki
b75f9e6d8a implementing new API 2023-03-22 00:29:10 +00:00
Saeed Maleki
aa1a37ab4d first version 2023-03-21 21:34:19 +00:00
Olli Saarikivi
0cfe2dcffb Add allpairs allreduce test
To support this include separate source and destination offsets in the trigger.
Add functions for getting the rank and world size from a communicator.
2023-03-21 19:00:13 +00:00
Saeed Maleki
e2ee8d80b9 perf fix for multi-node allgather 2023-03-21 06:26:12 +00:00
Saeed Maleki
8b30121240 Merge branch 'main' into chhwang/fix-trigger 2023-03-20 23:12:58 +00:00
Saeed Maleki
7cb2903799 some comment check ins 2023-03-20 21:07:58 +00:00
Saeed Maleki
93afed3e54 new allgather algorithm with both DMA and IB on a single node 2023-03-19 21:53:36 +00:00
Saeed Maleki
8a1ec28ff1 single node allgather works very well 2023-03-19 19:27:17 +00:00
Saeed Maleki
3e8f6758e5 both allgather algorithms 2023-03-19 06:35:40 +00:00
Saeed Maleki
17cbc84a14 both allgather algorithms 2023-03-19 06:35:32 +00:00
Saeed Maleki
a485a7f238 single node works fine -- multinode is problematic 2023-03-19 01:08:05 +00:00
Saeed Maleki
9cc21f70e6 redesigning fifo 2023-03-17 22:51:11 +00:00
Saeed Maleki
73df12358f Merge branch 'main' of https://github.com/microsoft/mscclpp into main 2023-03-17 17:54:17 +00:00
Saeed Maleki
e86df92fa5 fixed a typo in debugging information 2023-03-17 17:52:53 +00:00
Changho Hwang
67dbbd1692 Thread-safe trigger 2023-03-17 09:46:23 +00:00
Saeed Maleki
2061ea91f7 Add allgather_test (#14) 2023-03-17 12:55:20 +08:00