Commit Graph

157 Commits

Author SHA1 Message Date
Saeed Maleki
0887cfe768 no need for remapping anymore 2023-04-02 02:35:08 +00:00
Saeed Maleki
1731911d00 removing extra stream and destroying created ones 2023-04-02 02:07:41 +00:00
Saeed Maleki
4c6616e7b9 lint 2023-04-01 19:20:50 +00:00
Saeed Maleki
8927dd4d72 great allgather numbers with the current binding mechanism 2023-04-01 18:54:42 +00:00
Saeed Maleki
97dadd8d64 merged with main 2023-03-31 23:32:01 +00:00
Binyang Li
8e4edd4d83 fix lint 2023-03-31 08:32:45 +00:00
Binyang Li
af5825b474 bind numa node to communicator 2023-03-31 08:05:49 +00:00
Saeed Maleki
44e8760af7 allgather kernel2 2023-03-31 06:31:25 +00:00
Changho Hwang
fe1d7fee9e Bug Fix: null-termination in logging 2023-03-31 05:25:07 +00:00
Changho Hwang
b58eae4037 Minor changes 2023-03-30 07:11:41 +00:00
Saeed Maleki
e2cfd5ac83 a lot of documentation 2023-03-30 00:37:33 +00:00
Saeed Maleki
be5e422021 merged with main 2023-03-29 23:03:12 +00:00
Saeed Maleki
629d59a9c0 bug fix -- flush doesn't need to increment the epoch 2023-03-29 22:21:24 +00:00
Saeed Maleki
debd110874 fused flush instructions 2023-03-29 22:17:02 +00:00
Saeed Maleki
42b11c5c9a fast flush 2023-03-29 20:50:01 +00:00
Saeed Maleki
d97bee6973 flush mechanism 2023-03-29 17:31:20 +00:00
Binyang Li
d725e45f13 fix 2023-03-28 14:53:08 +00:00
Binyang Li
9c633a9633 bug fix 2023-03-28 14:40:51 +00:00
Binyang Li
487030887b refactor 2023-03-28 12:22:43 +00:00
Saeed Maleki
17e144c774 a typo in p2p proxy 2023-03-28 08:07:54 +00:00
Saeed Maleki
81b18cd9f9 a bit of clean up 2023-03-28 06:08:12 +00:00
Binyang2014
62279b0063 Add mscclppSetBootstrapConnTimeout (#34) 2023-03-28 14:01:56 +08:00
Saeed Maleki
fa26bdd9fc no gdr copy anywhere in the code except for the files that are not compiled 2023-03-28 05:40:40 +00:00
Saeed Maleki
33af4bfb67 no gdr copy anywhere in the code except for the files that are not compiled 2023-03-28 05:36:31 +00:00
Saeed Maleki
d9ba953fb0 gdrcopy is not initialized 2023-03-28 04:56:06 +00:00
Saeed Maleki
e7cccbf897 both head and tail are on OK to be only used by GPU 2023-03-28 04:26:39 +00:00
Saeed Maleki
952d852256 both head and tail are on OK to be only used by GPU 2023-03-28 04:24:45 +00:00
Ziyue Yang
b234cf5012 NPKit: add DMA events and fix bandwidth calculation (#33) 2023-03-28 09:58:32 +08:00
Saeed Maleki
32c4498fb8 typo fixes 2023-03-28 00:55:41 +00:00
Saeed Maleki
75036c0f12 typo fixes 2023-03-28 00:50:59 +00:00
Saeed Maleki
5adf3e3755 typo fix 2023-03-27 23:42:43 +00:00
Saeed Maleki
43c52367fb merged with main and simplified the callback requirements 2023-03-27 23:41:27 +00:00
Saeed Maleki
19bf369dc1 link format correction 2023-03-27 20:40:15 +00:00
Changho Hwang
8fc8f5b4fe Lint 2023-03-27 14:09:26 +00:00
Changho Hwang
8e4146aba9 Add mscclppSetLogHandler 2023-03-27 13:33:07 +00:00
Saeed Maleki
35ca25781a an important deadlock bug fix 2023-03-26 02:09:05 +00:00
Saeed Maleki
0898214f0a added mscclppGetErrorString 2023-03-24 22:57:14 +00:00
Saeed Maleki
0f31dafed5 Merge pull request #27 from microsoft/chhwang/accept-timeout
30 sec timeout for socket accept
2023-03-24 12:46:34 -07:00
Saeed Maleki
b07508b8f3 removed clockSec since it is not used 2023-03-24 19:43:41 +00:00
Saeed Maleki
35b8ebaf64 retry for almost 20 seconds 2023-03-24 19:42:00 +00:00
Saeed Maleki
3fb9383621 Merge pull request #24 from microsoft/madanm-apipush
simplified API for CUDA level communication calls.
2023-03-24 10:42:07 -07:00
Saeed Maleki
56b599b5e7 a bit of api change and clean up on docs 2023-03-24 17:41:04 +00:00
Changho Hwang
551eae0ba1 Update docs 2023-03-24 09:28:12 +00:00
Changho Hwang
7a4c27778f 30 sec timeout for socket accept 2023-03-24 08:29:00 +00:00
Ziyue Yang
f92b428cba Port NPKit 2023-03-24 06:41:16 +00:00
Binyang Li
68a258fce5 Fix postSend bug 2023-03-24 05:31:10 +00:00
Changho Hwang
e7459032e0 Add patch version 2023-03-24 05:19:25 +00:00
Changho Hwang
05fde6c6f3 minor changes 2023-03-24 04:51:20 +00:00
Saeed Maleki
777e93ee47 merged with main 2023-03-24 02:35:15 +00:00
Madan Musuvathi
66a3eb08c2 documenting the devCon API 2023-03-23 21:44:04 +00:00