caiomcbr
7493e2f075
Double buffering for NCCL APIs ( #324 )
...
Using two scratch buffers in each peer to exchange data.
---------
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
2024-07-15 22:18:53 +00:00
Binyang Li
422c81f0f8
remove make pylib-copy command ( #249 )
...
Fix #216
Remove `make pylib-copy`
2024-01-19 12:29:15 -08:00
Changho Hwang
5fa5bd2706
Check nvidia_peermem during runtime ( #234 )
2023-12-25 12:02:10 +08:00
Changho Hwang
c15a166cf0
Add a documentation issue template ( #230 )
2023-12-05 01:01:45 +00:00
Changho Hwang
544ff0c21d
ROCm support ( #213 )
...
Co-authored-by: Binyang Li <binyli@microsoft.com >
2023-11-24 16:41:56 +08:00
Changho Hwang
dab19e00c1
Templatize Dockerfiles & update workflows ( #223 )
...
Now build images by a script with a shared Dockerfile template
---------
Co-authored-by: Binyang Li <binyli@microsoft.com >
Co-authored-by: Saeed Maleki <saemal@microsoft.com >
2023-11-22 13:29:12 -08:00
Changho Hwang
f68820436c
Explicit build dependency on nvidia_peermem ( #201 )
2023-10-23 04:29:30 +00:00
Changho Hwang
8c0f9e84d0
v0.3.0 ( #171 )
2023-10-11 22:35:54 +08:00
Changho Hwang
11ac824cc7
Align interfaces of put/get/putPackets/getPackets ( #185 )
2023-10-07 22:18:26 +08:00
Changho Hwang
497a9e0c82
Add backup workflows ( #189 )
2023-10-07 15:13:49 +08:00
Changho Hwang
bb64f68d74
Update issue templates ( #179 )
2023-09-15 04:05:09 +00:00
Saeed Maleki
e7d5e652df
Python bindings ( #125 )
...
Co-authored-by: Olli Saarikivi <olsaarik@microsoft.com >
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
Co-authored-by: Binyang Li <binyli@microsoft.com >
2023-07-19 15:35:54 +08:00
Binyang2014
56bdbc2f32
Enable test for both cuda11 and cuda12 ( #124 )
...
Update pipeline: enable test for both cuda11 and cuda12
2023-07-10 13:19:14 +08:00
Changho Hwang
4114d65c60
Documents & minor updates ( #119 )
...
Co-authored-by: Saeed Maleki <saemal@microsoft.com >
Co-authored-by: Binyang Li <binyli@microsoft.com >
2023-07-07 17:35:05 +08:00
Changho Hwang
bb7b85a810
2-node AllReduce improvements ( #118 )
...
* Added `get()` interfaces to `SmChannel`
* Improved 2-node (8 gpus/node) AllReduce: algbw 139GB/s for 1GB (kernel
3) and 99GB/s for 48MB (kernel 4)
* Fixed a FIFO perf bug
* Several fixes & validations in mscclpp-test
---------
Co-authored-by: Binyang Li <binyli@microsoft.com >
Co-authored-by: Saeed Maleki <saemal@microsoft.com >
2023-07-07 07:05:46 +00:00
Binyang2014
2640578b22
Add performance check for mscclpp-test ( #110 )
...
- Add ndmv4 perf baseline
- change mscclpp-test to output perf number into a json file
- add python script to check the perf result with the baseline
2023-06-21 07:42:53 +00:00
Changho Hwang
5a4885ccbb
Misc updates ( #95 )
2023-06-12 13:53:43 +08:00
Changho Hwang
798631bd52
Update unit tests ( #81 )
2023-06-08 09:58:05 +00:00
Changho Hwang
7346e70109
Use MSCCL++ Docker image for CodeQL ( #94 )
2023-06-06 18:42:22 +08:00
Changho Hwang
0581bfb431
Fix CodeQL workflow ( #80 )
2023-05-22 14:03:30 +08:00
Changho Hwang
8d54bf3301
Update CI ( #79 )
2023-05-21 11:45:41 -07:00
Binyang Li
5704fb7c6a
update
2023-05-11 08:55:51 +00:00
Binyang Li
1487596dc8
update cpplint
2023-05-11 08:34:57 +00:00
Binyang Li
669c67b3de
enable github action on all ranches
2023-05-05 08:42:25 +00:00
Changho Hwang
72431957fd
Use clang-format-12
2023-03-27 14:00:03 +00:00
Binyang Li
7ec6ae9d6a
add cpplint and CI
2023-03-27 03:32:10 +00:00