Commit Graph

70 Commits

Author SHA1 Message Date
Saeed Maleki
82c27625e6 ipc uses a base ptr now 2023-04-27 21:33:15 +00:00
Changho Hwang
08e80f1754 IB: completely replaced with C++ interfaces 2023-04-27 04:01:46 +00:00
Olli Saarikivi
83c7ba1afb C++ API working, allgather_test_cpp passing 2023-04-19 17:11:21 +00:00
Olli Saarikivi
65597e1f63 Fix a copy-paste mistake 2023-04-14 23:35:10 +00:00
Olli Saarikivi
46790d79e8 Implement C API buffer registration support 2023-04-14 23:20:42 +00:00
Changho Hwang
7c2108d135 fix 2023-04-12 18:05:27 +00:00
Changho Hwang
1d3ea7bb83 fix 2023-04-12 17:25:54 +00:00
Changho Hwang
ca1f803692 Rename remote MR infos 2023-04-12 09:33:14 +00:00
Changho Hwang
dd0883b84f Lint 2023-04-12 09:25:35 +00:00
Changho Hwang
63a5be6953 Move ibQp to mscclppHostIBConn 2023-04-12 09:20:05 +00:00
Changho Hwang
bc729cd481 Move MRs / MR infos to mscclppHostIBConn & cleanup 2023-04-12 09:05:42 +00:00
Changho Hwang
fd3f928108 remove hostFifo & rename devFifo to just fifo 2023-04-12 08:08:19 +00:00
Saeed Maleki
edc3c237ed deleteing hostconn 2023-04-12 04:09:12 +00:00
Madan Musuvathi
9124856ea4 first version hostConn 2023-04-12 01:36:06 +00:00
Changho Hwang
7a0e64813a Add fifo for host connections 2023-04-11 12:28:45 +00:00
Changho Hwang
35acdf796c Add mscclppProxyFifo 2023-04-11 11:28:40 +00:00
Changho Hwang
d2c2ae72a7 Some cleanup 2023-04-11 08:45:22 +00:00
Saeed Maleki
b6179224aa lint 2023-04-11 01:36:37 +00:00
Saeed Maleki
48102a0858 removing unnecessary flags 2023-04-11 01:22:40 +00:00
Changho Hwang
a1ae982c61 Merge signalEpochId with proxySignalEpochId 2023-04-10 14:05:25 +00:00
Saeed Maleki
426e78997c name changes + documentation for clarity 2023-04-09 02:20:54 +00:00
Felipe Petroski Such
cc8c30f958 error checking 2023-04-07 15:38:48 -07:00
Felipe Petroski Such
38cd87cdcc add memory region functions 2023-04-07 15:38:48 -07:00
Changho Hwang
949a9cd0a3 Optional use of gdrcopy (#48)
Co-authored-by: Saeed Maleki <saemal@microsoft.com>
2023-04-07 13:36:59 +08:00
Saeed Maleki
cd3cd2c157 lint 2023-04-06 03:20:21 +00:00
Saeed Maleki
08275e93d7 added barrier API + pushed one after mscclppsetup 2023-04-06 03:15:54 +00:00
Saeed Maleki
1731911d00 removing extra stream and destroying created ones 2023-04-02 02:07:41 +00:00
Saeed Maleki
4c6616e7b9 lint 2023-04-01 19:20:50 +00:00
Saeed Maleki
8927dd4d72 great allgather numbers with the current binding mechanism 2023-04-01 18:54:42 +00:00
Binyang Li
af5825b474 bind numa node to communicator 2023-03-31 08:05:49 +00:00
Saeed Maleki
be5e422021 merged with main 2023-03-29 23:03:12 +00:00
Binyang2014
62279b0063 Add mscclppSetBootstrapConnTimeout (#34) 2023-03-28 14:01:56 +08:00
Saeed Maleki
33af4bfb67 no gdr copy anywhere in the code except for the files that are not compiled 2023-03-28 05:36:31 +00:00
Saeed Maleki
d9ba953fb0 gdrcopy is not initialized 2023-03-28 04:56:06 +00:00
Saeed Maleki
e7cccbf897 both head and tail are on OK to be only used by GPU 2023-03-28 04:26:39 +00:00
Saeed Maleki
952d852256 both head and tail are on OK to be only used by GPU 2023-03-28 04:24:45 +00:00
Saeed Maleki
43c52367fb merged with main and simplified the callback requirements 2023-03-27 23:41:27 +00:00
Saeed Maleki
19bf369dc1 link format correction 2023-03-27 20:40:15 +00:00
Changho Hwang
8fc8f5b4fe Lint 2023-03-27 14:09:26 +00:00
Changho Hwang
8e4146aba9 Add mscclppSetLogHandler 2023-03-27 13:33:07 +00:00
Saeed Maleki
0898214f0a added mscclppGetErrorString 2023-03-24 22:57:14 +00:00
Saeed Maleki
3fb9383621 Merge pull request #24 from microsoft/madanm-apipush
simplified API for CUDA level communication calls.
2023-03-24 10:42:07 -07:00
Saeed Maleki
56b599b5e7 a bit of api change and clean up on docs 2023-03-24 17:41:04 +00:00
Changho Hwang
551eae0ba1 Update docs 2023-03-24 09:28:12 +00:00
Ziyue Yang
f92b428cba Port NPKit 2023-03-24 06:41:16 +00:00
Changho Hwang
05fde6c6f3 minor changes 2023-03-24 04:51:20 +00:00
Saeed Maleki
777e93ee47 merged with main 2023-03-24 02:35:15 +00:00
Madan Musuvathi
e6ee81e4fa fixed the order of remote rank and tag in mscclppConnect API 2023-03-23 21:09:04 +00:00
Madan Musuvathi
72edabe2a6 added GetDevConn api to retrieve a connection from remoteRank and tag 2023-03-23 21:03:30 +00:00
Madan Musuvathi
896539b236 Comm owns all state including devcons 2023-03-22 22:43:32 +00:00