Commit Graph

175 Commits

Author SHA1 Message Date
Saeed Maleki
0f31dafed5 Merge pull request #27 from microsoft/chhwang/accept-timeout
30 sec timeout for socket accept
2023-03-24 12:46:34 -07:00
Saeed Maleki
b07508b8f3 removed clockSec since it is not used 2023-03-24 19:43:41 +00:00
Saeed Maleki
35b8ebaf64 retry for almost 20 seconds 2023-03-24 19:42:00 +00:00
Saeed Maleki
3fb9383621 Merge pull request #24 from microsoft/madanm-apipush
simplified API for CUDA level communication calls.
2023-03-24 10:42:07 -07:00
Saeed Maleki
56b599b5e7 a bit of api change and clean up on docs 2023-03-24 17:41:04 +00:00
Changho Hwang
551eae0ba1 Update docs 2023-03-24 09:28:12 +00:00
Changho Hwang
7a4c27778f 30 sec timeout for socket accept 2023-03-24 08:29:00 +00:00
Changho Hwang
274e921009 Minor fixes 2023-03-24 07:28:30 +00:00
Changho Hwang
b056db1e1f Merge pull request #26 from microsoft/ziyyang/npkit-pr
Port NPKit
2023-03-24 14:52:59 +08:00
Saeed Maleki
f7dcea914d Merge branch 'madanm-apipush' of https://github.com/microsoft/mscclpp into madanm-apipush 2023-03-24 06:49:53 +00:00
Saeed Maleki
c042112b6b perf debug for allgather 2023-03-24 06:49:38 +00:00
Ziyue Yang
f92b428cba Port NPKit 2023-03-24 06:41:16 +00:00
Changho Hwang
6a10f6135f Merge pull request #25 from microsoft/binyli/bugfix
Fix postSend bug
2023-03-24 14:23:29 +08:00
Binyang Li
68a258fce5 Fix postSend bug 2023-03-24 05:31:10 +00:00
Changho Hwang
e7459032e0 Add patch version 2023-03-24 05:19:25 +00:00
Changho Hwang
05fde6c6f3 minor changes 2023-03-24 04:51:20 +00:00
Saeed Maleki
777e93ee47 merged with main 2023-03-24 02:35:15 +00:00
Saeed Maleki
56d86472e6 done with allgather_test commenting 2023-03-24 00:22:47 +00:00
Saeed Maleki
58595b1410 Merge pull request #23 from microsoft/saemal/apipush
cleaner allgather_test
2023-03-23 16:17:55 -07:00
Saeed Maleki
86f79ef442 cleaner allgather_test 2023-03-23 23:13:00 +00:00
Madan Musuvathi
66a3eb08c2 documenting the devCon API 2023-03-23 21:44:04 +00:00
Madan Musuvathi
208babfeb7 Merge branch 'madanm-apipush' of https://github.com/microsoft/mscclpp into madanm-apipush 2023-03-23 21:11:20 +00:00
Madan Musuvathi
e6ee81e4fa fixed the order of remote rank and tag in mscclppConnect API 2023-03-23 21:09:04 +00:00
Madan Musuvathi
72edabe2a6 added GetDevConn api to retrieve a connection from remoteRank and tag 2023-03-23 21:03:30 +00:00
Saeed Maleki
3e6bb0ec0c minor changes 2023-03-23 04:47:34 +00:00
Saeed Maleki
bf01f063fd Merge pull request #21 from microsoft/chhwang/docs
Update docs
2023-03-22 21:43:51 -07:00
Changho Hwang
ce660217b1 Update docs 2023-03-23 04:14:25 +00:00
Saeed Maleki
ea71849dca more docs 2023-03-23 02:23:15 +00:00
Madan Musuvathi
e569175832 added documentation 2023-03-23 00:39:45 +00:00
Madan Musuvathi
7de21eba6f created a separate fifo class 2023-03-23 00:03:33 +00:00
Madan Musuvathi
896539b236 Comm owns all state including devcons 2023-03-22 22:43:32 +00:00
Saeed Maleki
febebde26a Merge pull request #20 from microsoft/chhwang/dealloc
Dealloc more resources
2023-03-22 14:48:57 -07:00
Saeed Maleki
270839797e Merge branch 'main' into chhwang/dealloc 2023-03-22 21:14:42 +00:00
Saeed Maleki
1e5d8976d4 Merge pull request #19 from microsoft/chhwang/get-unique-id
Init from a given mscclppUniqueId
2023-03-22 14:12:43 -07:00
Saeed Maleki
e1cd88ca0b bootstrap test with and without uniq_id 2023-03-22 21:11:27 +00:00
Madan Musuvathi
4c459aa0df allgather_test code cleanup 2023-03-22 20:38:29 +00:00
Madan Musuvathi
261fd7f838 allgather_test code cleanup 2023-03-22 18:50:23 +00:00
Madan Musuvathi
44c6b94747 api version 1 2023-03-22 18:28:30 +00:00
Madan Musuvathi
6ea460bb3a fusing signal with sync 2023-03-22 18:16:42 +00:00
Changho Hwang
48a23243a4 Dealloc more resources 2023-03-22 12:06:35 +00:00
Changho Hwang
9f2eef35d3 Init from a given mscclppUniqueId 2023-03-22 11:25:49 +00:00
Changho Hwang
61511473cb Fix bootstrap_test w/o MPI 2023-03-22 10:03:30 +00:00
Changho Hwang
9a6ddfd244 Update makefile 2023-03-22 09:19:47 +00:00
Saeed Maleki
483b0c8433 flag is now allocated by the system 2023-03-22 05:14:24 +00:00
Madan Musuvathi
feea5c6f30 minor change 2023-03-22 02:51:01 +00:00
Saeed Maleki
0a707d84ec new api works -- single node is not performant 2023-03-22 02:19:49 +00:00
Saeed Maleki
b75f9e6d8a implementing new API 2023-03-22 00:29:10 +00:00
Saeed Maleki
aa1a37ab4d first version 2023-03-21 21:34:19 +00:00
Olli Saarikivi
82df24018d Merge pull request #16 from microsoft/olli/allreduce-allpairs
Add Allpairs Allreduce algorithm and test.
Trigger now has separate source and destination offsets.
Add functions for getting the rank and world size from a communicator.
2023-03-21 14:01:25 -07:00
Olli Saarikivi
0cfe2dcffb Add allpairs allreduce test
To support this include separate source and destination offsets in the trigger.
Add functions for getting the rank and world size from a communicator.
2023-03-21 19:00:13 +00:00