Commit Graph

263 Commits

Author SHA1 Message Date
Crutcher Dunnavant
d014693288 cleanup tests 2023-04-07 11:37:24 -07:00
Crutcher Dunnavant
68eff98bbc update ci.sh 2023-04-07 11:27:45 -07:00
Crutcher Dunnavant
e65def8657 bug 2023-04-07 11:27:45 -07:00
Crutcher Dunnavant
7753c38eb1 working on connect 2023-04-07 11:27:45 -07:00
Changho Hwang
b6ea0ca266 IB unit test (#47) 2023-04-07 21:45:14 +08:00
Changho Hwang
b7461facff Fix Makefile 2023-04-07 13:09:56 +00:00
Changho Hwang
949a9cd0a3 Optional use of gdrcopy (#48)
Co-authored-by: Saeed Maleki <saemal@microsoft.com>
2023-04-07 13:36:59 +08:00
Saeed Maleki
6c1ebed569 combining ./python and ./ lint formats into makefile 2023-04-06 23:26:56 +00:00
Ziyue Yang
352a10a33d NPKit: improve event collection for async requests (#45) 2023-04-06 16:21:34 +08:00
Saeed Maleki
cd3cd2c157 lint 2023-04-06 03:20:21 +00:00
Saeed Maleki
08275e93d7 added barrier API + pushed one after mscclppsetup 2023-04-06 03:15:54 +00:00
Saeed Maleki
ef851d2557 Merge pull request #44 from microsoft/crutcher-fixformat
Fixup formating for python
2023-04-05 16:28:37 -07:00
Crutcher Dunnavant
f7e330da21 Fixup formating for python 2023-04-05 21:47:08 +00:00
Saeed Maleki
fa7d2ad877 Merge pull request #43 from microsoft/crutcher-bootstrap
[python] Pull in bootstrap all gather, log callbacks.
2023-04-04 18:43:22 -07:00
Saeed Maleki
5fe87e5da6 lint fixes 2023-04-05 01:41:21 +00:00
Crutcher Dunnavant
151b29f70c docs and format 2023-04-04 18:55:08 +00:00
Crutcher Dunnavant
659a88a767 remove env hook 2023-04-04 18:31:08 +00:00
Crutcher Dunnavant
aaf93c858d extract level 2023-04-04 18:29:45 +00:00
Crutcher Dunnavant
0df50830e1 log callbacks 2023-04-04 17:57:46 +00:00
Crutcher Dunnavant
423affeaa6 all gather bytes, json, pickle 2023-04-03 23:39:06 +00:00
Crutcher Dunnavant
17e1885981 allocation fixes 2023-04-03 23:39:06 +00:00
Crutcher Dunnavant
8cac41c8ac [python] working on bootstrap all gather bug 2023-04-03 23:39:06 +00:00
Saeed Maleki
2c6460ce72 bug fix for allgather0 2023-04-03 04:36:20 +00:00
Saeed Maleki
bfbdaf6b05 Merge pull request #41 from microsoft/saemal/allgather_hier
Saemal/allgather hier
2023-04-01 20:37:43 -07:00
Saeed Maleki
5ff64d36f4 documents for allgather2 + refactoring local allgather 2023-04-02 03:36:22 +00:00
Saeed Maleki
0887cfe768 no need for remapping anymore 2023-04-02 02:35:08 +00:00
Saeed Maleki
5cf3f3c524 Merge pull request #39 from microsoft/binyli/numabindAPI
bind proxy threads and host allocation are now done automatically and closest to the device
2023-04-01 19:16:43 -07:00
Saeed Maleki
1731911d00 removing extra stream and destroying created ones 2023-04-02 02:07:41 +00:00
Saeed Maleki
4c6616e7b9 lint 2023-04-01 19:20:50 +00:00
Saeed Maleki
8927dd4d72 great allgather numbers with the current binding mechanism 2023-04-01 18:54:42 +00:00
Saeed Maleki
701255959e lint 2023-03-31 23:34:43 +00:00
Saeed Maleki
97dadd8d64 merged with main 2023-03-31 23:32:01 +00:00
Binyang Li
8e4edd4d83 fix lint 2023-03-31 08:32:45 +00:00
Binyang Li
af5825b474 bind numa node to communicator 2023-03-31 08:05:49 +00:00
Saeed Maleki
44e8760af7 allgather kernel2 2023-03-31 06:31:25 +00:00
Changho Hwang
fe1d7fee9e Bug Fix: null-termination in logging 2023-03-31 05:25:07 +00:00
Saeed Maleki
fef0bff945 a third kernel for allgather cross-node 2023-03-30 23:24:04 +00:00
Saeed Maleki
29254439e5 Merge pull request #38 from microsoft/saemal/removing_gdrcopy
removing gdrcopy and adding flush functionality
2023-03-30 13:17:47 -07:00
Changho Hwang
b58eae4037 Minor changes 2023-03-30 07:11:41 +00:00
Saeed Maleki
e2cfd5ac83 a lot of documentation 2023-03-30 00:37:33 +00:00
Saeed Maleki
be5e422021 merged with main 2023-03-29 23:03:12 +00:00
Saeed Maleki
4be1dcc5d6 Merge pull request #37 from microsoft/saemal/removing_gdrcopy_flush
flush capability added in this PR
2023-03-29 15:44:30 -07:00
Saeed Maleki
629d59a9c0 bug fix -- flush doesn't need to increment the epoch 2023-03-29 22:21:24 +00:00
Saeed Maleki
debd110874 fused flush instructions 2023-03-29 22:17:02 +00:00
Saeed Maleki
42b11c5c9a fast flush 2023-03-29 20:50:01 +00:00
Saeed Maleki
d97bee6973 flush mechanism 2023-03-29 17:31:20 +00:00
Bin Wang
7880be8ee2 Fix the 2 GiB limit in allgather test. (#36) 2023-03-29 19:02:43 +08:00
Saeed Maleki
7a0962e4be Merge pull request #35 from microsoft/binyli/refactor
Refactor mscclppProxyService
2023-03-28 13:46:16 -07:00
Binyang Li
d725e45f13 fix 2023-03-28 14:53:08 +00:00
Binyang Li
9c633a9633 bug fix 2023-03-28 14:40:51 +00:00