Commit Graph

470 Commits

Author SHA1 Message Date
Saeed Maleki
ef558a42e8 wip 2023-05-12 05:54:32 +00:00
Binyang2014
643771bf93 Merge pull request #71 from microsoft/binyli/merge-main
Resolve conflict and merge main branch to api-extension
2023-05-11 17:39:06 +08:00
Binyang Li
e63aae7142 Merge apt-extension 2023-05-11 09:20:41 +00:00
Olli Saarikivi
96a0c45fb4 Remove makefile 2023-05-11 00:23:21 +00:00
Olli Saarikivi
9f6c48cbf9 Format all files 2023-05-11 00:23:14 +00:00
Olli Saarikivi
ccf45b33a2 Delete old init code and other C-style code 2023-05-10 22:03:42 +00:00
Olli Saarikivi
b2dfd8a8fe Merge branch 'api-extension' of https://github.com/microsoft/mscclpp into api-extension 2023-05-10 20:50:51 +00:00
Olli Saarikivi
beaf2aea39 Move public headers under include/ 2023-05-10 20:46:49 +00:00
Saeed Maleki
c05586f074 Merge branch 'api-extension' of https://github.com/microsoft/mscclpp into api-extension 2023-05-10 20:24:40 +00:00
Saeed Maleki
33eb4093ac timeout fix 2023-05-10 20:24:33 +00:00
Olli Saarikivi
f4ecae7c96 Rename tests/ to test/ 2023-05-10 18:49:02 +00:00
Olli Saarikivi
75a2af8de2 Add GoogleTest with CTest integration + some tests
Also rename addSetup to onSetup to unify naming.
2023-05-10 18:46:55 +00:00
Olli Saarikivi
4045323aa2 Merge branch 'saemal/api-extension' into api-extension 2023-05-10 15:30:10 +00:00
Binyang Li
b948ed6bfd Merge branch 'main' into binyli/merge-main 2023-05-10 06:02:22 +00:00
Binyang2014
f8c1dc64da Update sm copy test (#70)
result for 1K message:
```
# Launching MSCCL++ proxy threads
#
#                                    in-place                       out-of-place          
#       size         count     time   algbw   busbw  #wrong     time   algbw   busbw  #wrong
#        (B)    (elements)     (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)       
        1024           256                                      8.34    0.12    0.12      0
Stopping MSCCL++ proxy threads
# Out of bounds values : 0 OK
```

result for 1G message
```
#                                    in-place                       out-of-place          
#       size         count     time   algbw   busbw  #wrong     time   algbw   busbw  #wrong
#        (B)    (elements)     (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)       
  1073741824     268435456                                    5716.9  187.82  187.82      0
Stopping MSCCL++ proxy threads
# Out of bounds values : 0 OK
```
For 1KB, the latency is better than nccl, which is: 16.68us, for 1GB data, the bandwidth is a bit worse than nccl, which is 190.74 GB/s
2023-05-10 13:56:18 +08:00
Saeed Maleki
1769138568 Host Epoch + Error code 2023-05-09 23:10:12 +00:00
Saeed Maleki
8b384600a9 host epoch works 2023-05-09 22:17:43 +00:00
Binyang2014
bbf7ef621e Enable github action on all branches (#68) 2023-05-09 23:07:13 +09:00
Binyang Li
9c40d616d9 Merge main branch 2023-05-09 10:59:04 +00:00
Binyang2014
8650dbaff8 Add exception class for mscclpp (#67)
Add exception class for mscclpp
2023-05-06 16:27:25 +08:00
Olli Saarikivi
4f528d29a0 Make clang-format style file explicit 2023-05-05 19:15:38 +00:00
Olli Saarikivi
86be901d98 CMake improvements 2023-05-05 19:11:33 +00:00
Olli Saarikivi
051643b4c2 Fix clang-format glob 2023-05-05 18:13:34 +00:00
Olli Saarikivi
adaa75536d Add clang-format to CMake 2023-05-05 18:05:55 +00:00
Binyang Li
669c67b3de enable github action on all ranches 2023-05-05 08:42:25 +00:00
Saeed Maleki
9fb29f9dfc timeout for flush 2023-05-04 17:48:24 +00:00
Binyang Li
bb3239fd6b Fix IB write issue 2023-05-04 11:03:45 +00:00
Saeed Maleki
9ecf1f9945 Merge pull request #66 from microsoft/olli/api-extension
Olli/api extension
2023-05-03 19:47:03 -07:00
Olli Saarikivi
ddc9e681c8 Add ib_test to CMake 2023-05-04 00:57:34 +00:00
Olli Saarikivi
d7103602ac Only build C++ tests in CMake 2023-05-04 00:55:35 +00:00
Olli Saarikivi
bd2121a2ef CMake improvement 2023-05-04 00:53:50 +00:00
Olli Saarikivi
09d5f7c12e Fixes for cmake 2023-05-04 00:39:30 +00:00
Olli Saarikivi
503cdd5c7e CMake build system transition WIP 2023-05-03 23:52:13 +00:00
Saeed Maleki
518f325225 kernel 2 is also performant 2023-05-03 22:45:47 +00:00
Saeed Maleki
7af687954c removing old mscclppComm_t comm from communicator 2023-05-03 20:23:51 +00:00
Olli Saarikivi
4a41c19e72 Fix performance bug and base pointer offset 2023-05-03 19:40:23 +00:00
Olli Saarikivi
39666f999f Quick fix 2023-05-03 19:20:45 +00:00
Olli Saarikivi
81e7d1b344 Channels work 2023-05-03 17:11:25 +00:00
Saeed Maleki
6002a520b6 solved merge conflict 2023-05-02 23:56:11 +00:00
Saeed Maleki
54d1e1872c testing writes with signal is passing 2023-05-02 23:53:31 +00:00
Olli Saarikivi
4ba8516832 allgather_test_cpp functional again 2023-05-02 23:14:13 +00:00
Saeed Maleki
fc12947c5b fixing flush for IB 2023-05-02 21:42:25 +00:00
Saeed Maleki
a4e6ffe2bc epoch creation 2023-05-02 21:39:43 +00:00
Olli Saarikivi
c44b48b361 Epoch non-copyable 2023-05-02 21:38:26 +00:00
Olli Saarikivi
66ce01baf3 Make NonblockingFuture copyable 2023-05-02 20:46:30 +00:00
Olli Saarikivi
c7b7d20d85 Export epoch header 2023-05-02 20:35:16 +00:00
Olli Saarikivi
358c3d62b8 Generalize connectionSetup() into setup() 2023-05-02 20:06:30 +00:00
Saeed Maleki
fe2b778abc flushing the full cq 2023-05-02 03:50:57 +00:00
Saeed Maleki
6aa023ed1e moving serializer outside 2023-05-02 03:28:09 +00:00
Saeed Maleki
961f5b38dd more debbuging info + testing 1000 memory registerations 2023-05-02 00:44:13 +00:00