Commit Graph

504 Commits

Author SHA1 Message Date
Changho Hwang
0c14a67ad2 [mscclpp-test] Add AllReduce and AllToAll tests (#83) 2023-06-07 10:58:47 +00:00
Binyang2014
d9568a3235 Setup Azure pipeline to run mscclpp-test (#93) 2023-06-07 14:14:49 +08:00
Changho Hwang
7346e70109 Use MSCCL++ Docker image for CodeQL (#94) 2023-06-06 18:42:22 +08:00
Changho Hwang
85e664c2f7 Update docs (#88) 2023-06-05 13:13:10 +08:00
Changho Hwang
9cee6c4a74 Cleanup old files and functions (#86) 2023-06-01 17:34:57 +08:00
Olli Saarikivi
457c422791 Remove alloc.h and beef up cuda_utils.hpp (#82) 2023-05-24 08:34:18 +00:00
Binyang2014
216373eab2 Add allgather test to mscclpp-test (#78)
Finish allGather

Co-authored-by: Changho Hwang <changhohwang@microsoft.com>
2023-05-23 00:37:25 -07:00
Changho Hwang
0581bfb431 Fix CodeQL workflow (#80) 2023-05-22 14:03:30 +08:00
Changho Hwang
8d54bf3301 Update CI (#79) 2023-05-21 11:45:41 -07:00
Binyang2014
a3cf48cc5d Rewrite mscclpp-test with cpp style API (#77)
- Rewrite mscclpp-test with cpp style API
- Add SM copy
- add new sendRecv test
2023-05-19 14:14:19 +08:00
Olli Saarikivi
4c0883bc91 Add a missing throw 2023-05-16 16:16:00 -07:00
Olli Saarikivi
4e4d1972e3 Cuda smart pointers 2023-05-16 16:16:00 -07:00
Olli Saarikivi
00d4896c25 Rudimentary CTest support for test executables 2023-05-16 16:16:00 -07:00
Olli Saarikivi
d83343ef4e Make getWc not return a void pointer 2023-05-16 22:52:38 +00:00
Olli Saarikivi
dee55997e9 Remove free and most reinterpret_casts in IB code 2023-05-16 22:48:16 +00:00
Saeed Maleki
5de083ad7e freeing cudaMalloc'ed pointers 2023-05-15 23:53:30 +00:00
Saeed Maleki
966402706c Merge pull request #72 from microsoft/ziyyang/doxygen
Add Doxygen-based document
2023-05-15 16:50:17 -07:00
Saeed Maleki
e21392e2c3 Merge branch 'main' into ziyyang/doxygen 2023-05-15 23:45:54 +00:00
Saeed Maleki
112d1eeb22 Merge pull request #75 from microsoft/api-extension
Merge api-extension branch to main
2023-05-15 16:35:50 -07:00
Saeed Maleki
c9ac615b20 Merge pull request #74 from microsoft/saemal/offloading
offloading allgather to CPU entirely
2023-05-15 16:27:00 -07:00
Saeed Maleki
6f7ca05305 Merge remote-tracking branch 'origin/api-extension' into saemal/offloading 2023-05-12 22:43:22 +00:00
Saeed Maleki
2a7b745972 fully working with double buffering 2023-05-12 22:42:22 +00:00
Olli Saarikivi
8f2d7922ed Change install dir 2023-05-12 21:25:29 +00:00
Olli Saarikivi
d58e698d51 Add headers to install and set default install dir 2023-05-12 21:23:01 +00:00
Saeed Maleki
2691784b88 working -- at least for single node 2023-05-12 20:21:58 +00:00
Saeed Maleki
113473a116 more progress 2023-05-12 07:01:21 +00:00
Saeed Maleki
31851ad82c host epoch removed 2023-05-12 06:11:12 +00:00
Saeed Maleki
ef558a42e8 wip 2023-05-12 05:54:32 +00:00
Saeed Maleki
260c3e35f0 Merge pull request #73 from microsoft/binyli/exception
Refine exception
2023-05-11 14:29:41 -07:00
Saeed Maleki
62f96f316c Merge branch 'api-extension' into binyli/exception 2023-05-11 21:24:18 +00:00
Binyang2014
643771bf93 Merge pull request #71 from microsoft/binyli/merge-main
Resolve conflict and merge main branch to api-extension
2023-05-11 17:39:06 +08:00
Binyang Li
e63aae7142 Merge apt-extension 2023-05-11 09:20:41 +00:00
Binyang Li
5704fb7c6a update 2023-05-11 08:55:51 +00:00
Binyang Li
1487596dc8 update cpplint 2023-05-11 08:34:57 +00:00
Binyang Li
785a973ace refine exception 2023-05-11 08:25:25 +00:00
Ziyue Yang
e257f19cb8 add doc section in readme 2023-05-11 00:46:02 +00:00
Olli Saarikivi
96a0c45fb4 Remove makefile 2023-05-11 00:23:21 +00:00
Olli Saarikivi
9f6c48cbf9 Format all files 2023-05-11 00:23:14 +00:00
Olli Saarikivi
ccf45b33a2 Delete old init code and other C-style code 2023-05-10 22:03:42 +00:00
Olli Saarikivi
b2dfd8a8fe Merge branch 'api-extension' of https://github.com/microsoft/mscclpp into api-extension 2023-05-10 20:50:51 +00:00
Olli Saarikivi
beaf2aea39 Move public headers under include/ 2023-05-10 20:46:49 +00:00
Saeed Maleki
c05586f074 Merge branch 'api-extension' of https://github.com/microsoft/mscclpp into api-extension 2023-05-10 20:24:40 +00:00
Saeed Maleki
33eb4093ac timeout fix 2023-05-10 20:24:33 +00:00
Olli Saarikivi
f4ecae7c96 Rename tests/ to test/ 2023-05-10 18:49:02 +00:00
Olli Saarikivi
75a2af8de2 Add GoogleTest with CTest integration + some tests
Also rename addSetup to onSetup to unify naming.
2023-05-10 18:46:55 +00:00
Ziyue Yang
48a278d2a5 init doxyfile 2023-05-10 16:23:02 +00:00
Olli Saarikivi
4045323aa2 Merge branch 'saemal/api-extension' into api-extension 2023-05-10 15:30:10 +00:00
Binyang Li
b948ed6bfd Merge branch 'main' into binyli/merge-main 2023-05-10 06:02:22 +00:00
Binyang2014
f8c1dc64da Update sm copy test (#70)
result for 1K message:
```
# Launching MSCCL++ proxy threads
#
#                                    in-place                       out-of-place          
#       size         count     time   algbw   busbw  #wrong     time   algbw   busbw  #wrong
#        (B)    (elements)     (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)       
        1024           256                                      8.34    0.12    0.12      0
Stopping MSCCL++ proxy threads
# Out of bounds values : 0 OK
```

result for 1G message
```
#                                    in-place                       out-of-place          
#       size         count     time   algbw   busbw  #wrong     time   algbw   busbw  #wrong
#        (B)    (elements)     (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)       
  1073741824     268435456                                    5716.9  187.82  187.82      0
Stopping MSCCL++ proxy threads
# Out of bounds values : 0 OK
```
For 1KB, the latency is better than nccl, which is: 16.68us, for 1GB data, the bandwidth is a bit worse than nccl, which is 190.74 GB/s
2023-05-10 13:56:18 +08:00
Saeed Maleki
1769138568 Host Epoch + Error code 2023-05-09 23:10:12 +00:00