Saeed Maleki
0f31dafed5
Merge pull request #27 from microsoft/chhwang/accept-timeout
...
30 sec timeout for socket accept
2023-03-24 12:46:34 -07:00
Saeed Maleki
b07508b8f3
removed clockSec since it is not used
2023-03-24 19:43:41 +00:00
Saeed Maleki
35b8ebaf64
retry for almost 20 seconds
2023-03-24 19:42:00 +00:00
Saeed Maleki
3fb9383621
Merge pull request #24 from microsoft/madanm-apipush
...
simplified API for CUDA level communication calls.
2023-03-24 10:42:07 -07:00
Saeed Maleki
56b599b5e7
a bit of api change and clean up on docs
2023-03-24 17:41:04 +00:00
Changho Hwang
551eae0ba1
Update docs
2023-03-24 09:28:12 +00:00
Changho Hwang
7a4c27778f
30 sec timeout for socket accept
2023-03-24 08:29:00 +00:00
Changho Hwang
274e921009
Minor fixes
2023-03-24 07:28:30 +00:00
Changho Hwang
b056db1e1f
Merge pull request #26 from microsoft/ziyyang/npkit-pr
...
Port NPKit
2023-03-24 14:52:59 +08:00
Saeed Maleki
f7dcea914d
Merge branch 'madanm-apipush' of https://github.com/microsoft/mscclpp into madanm-apipush
2023-03-24 06:49:53 +00:00
Saeed Maleki
c042112b6b
perf debug for allgather
2023-03-24 06:49:38 +00:00
Ziyue Yang
f92b428cba
Port NPKit
2023-03-24 06:41:16 +00:00
Changho Hwang
6a10f6135f
Merge pull request #25 from microsoft/binyli/bugfix
...
Fix postSend bug
2023-03-24 14:23:29 +08:00
Binyang Li
68a258fce5
Fix postSend bug
2023-03-24 05:31:10 +00:00
Changho Hwang
e7459032e0
Add patch version
2023-03-24 05:19:25 +00:00
Changho Hwang
05fde6c6f3
minor changes
2023-03-24 04:51:20 +00:00
Saeed Maleki
777e93ee47
merged with main
2023-03-24 02:35:15 +00:00
Saeed Maleki
56d86472e6
done with allgather_test commenting
2023-03-24 00:22:47 +00:00
Saeed Maleki
58595b1410
Merge pull request #23 from microsoft/saemal/apipush
...
cleaner allgather_test
2023-03-23 16:17:55 -07:00
Saeed Maleki
86f79ef442
cleaner allgather_test
2023-03-23 23:13:00 +00:00
Madan Musuvathi
66a3eb08c2
documenting the devCon API
2023-03-23 21:44:04 +00:00
Madan Musuvathi
208babfeb7
Merge branch 'madanm-apipush' of https://github.com/microsoft/mscclpp into madanm-apipush
2023-03-23 21:11:20 +00:00
Madan Musuvathi
e6ee81e4fa
fixed the order of remote rank and tag in mscclppConnect API
2023-03-23 21:09:04 +00:00
Madan Musuvathi
72edabe2a6
added GetDevConn api to retrieve a connection from remoteRank and tag
2023-03-23 21:03:30 +00:00
Saeed Maleki
3e6bb0ec0c
minor changes
2023-03-23 04:47:34 +00:00
Saeed Maleki
bf01f063fd
Merge pull request #21 from microsoft/chhwang/docs
...
Update docs
2023-03-22 21:43:51 -07:00
Changho Hwang
ce660217b1
Update docs
2023-03-23 04:14:25 +00:00
Saeed Maleki
ea71849dca
more docs
2023-03-23 02:23:15 +00:00
Madan Musuvathi
e569175832
added documentation
2023-03-23 00:39:45 +00:00
Madan Musuvathi
7de21eba6f
created a separate fifo class
2023-03-23 00:03:33 +00:00
Madan Musuvathi
896539b236
Comm owns all state including devcons
2023-03-22 22:43:32 +00:00
Saeed Maleki
febebde26a
Merge pull request #20 from microsoft/chhwang/dealloc
...
Dealloc more resources
2023-03-22 14:48:57 -07:00
Saeed Maleki
270839797e
Merge branch 'main' into chhwang/dealloc
2023-03-22 21:14:42 +00:00
Saeed Maleki
1e5d8976d4
Merge pull request #19 from microsoft/chhwang/get-unique-id
...
Init from a given mscclppUniqueId
2023-03-22 14:12:43 -07:00
Saeed Maleki
e1cd88ca0b
bootstrap test with and without uniq_id
2023-03-22 21:11:27 +00:00
Madan Musuvathi
4c459aa0df
allgather_test code cleanup
2023-03-22 20:38:29 +00:00
Madan Musuvathi
261fd7f838
allgather_test code cleanup
2023-03-22 18:50:23 +00:00
Madan Musuvathi
44c6b94747
api version 1
2023-03-22 18:28:30 +00:00
Madan Musuvathi
6ea460bb3a
fusing signal with sync
2023-03-22 18:16:42 +00:00
Changho Hwang
48a23243a4
Dealloc more resources
2023-03-22 12:06:35 +00:00
Changho Hwang
9f2eef35d3
Init from a given mscclppUniqueId
2023-03-22 11:25:49 +00:00
Changho Hwang
61511473cb
Fix bootstrap_test w/o MPI
2023-03-22 10:03:30 +00:00
Changho Hwang
9a6ddfd244
Update makefile
2023-03-22 09:19:47 +00:00
Saeed Maleki
483b0c8433
flag is now allocated by the system
2023-03-22 05:14:24 +00:00
Madan Musuvathi
feea5c6f30
minor change
2023-03-22 02:51:01 +00:00
Saeed Maleki
0a707d84ec
new api works -- single node is not performant
2023-03-22 02:19:49 +00:00
Saeed Maleki
b75f9e6d8a
implementing new API
2023-03-22 00:29:10 +00:00
Saeed Maleki
aa1a37ab4d
first version
2023-03-21 21:34:19 +00:00
Olli Saarikivi
82df24018d
Merge pull request #16 from microsoft/olli/allreduce-allpairs
...
Add Allpairs Allreduce algorithm and test.
Trigger now has separate source and destination offsets.
Add functions for getting the rank and world size from a communicator.
2023-03-21 14:01:25 -07:00
Olli Saarikivi
0cfe2dcffb
Add allpairs allreduce test
...
To support this include separate source and destination offsets in the trigger.
Add functions for getting the rank and world size from a communicator.
2023-03-21 19:00:13 +00:00