Changho Hwang
|
6cd8960394
|
DirectChannel Unit Tests (#102)
* Add DirectChannel unit tests
* Split mp_unit_tests.cu into multiple files
|
2023-06-15 20:55:57 +08:00 |
|
Changho Hwang
|
c4a5958dfc
|
Fix hanging bootstrap issues (#100)
* Renew socket interfaces and error handling into C++ style
* Fix bootstrap hanging bugs
* Misc code cleanup
---------
Co-authored-by: Binyang Li <binyli@microsoft.com>
Co-authored-by: Saeed Maleki <saemal@microsoft.com>
|
2023-06-15 11:29:49 +08:00 |
|
Binyang2014
|
b1ce368656
|
Implement host offload algorithm for allgather (#84)
Implement host offload algorithm for allgather
For 1n-8p
```
# Initializing MSCCL++
# Setting up the connection in MSCCL++
#
# in-place out-of-place
# size count time algbw busbw #wrong time algbw busbw #wrong
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
1024 32 73.02 0.01 0.01 0
# Out of bounds values : 0 OK
#
```
For 2n-16p
```
# Initializing MSCCL++
# Setting up the connection in MSCCL++
#
# in-place out-of-place
# size count time algbw busbw #wrong time algbw busbw #wrong
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
1024 16 90.30 0.01 0.01 0
# Out of bounds values : 0 OK
#
```
|
2023-06-13 10:01:58 +00:00 |
|
Changho Hwang
|
76718e4015
|
Saemal/atomic signal (#96)
* code complelete
* fix correctness issue
* Fix correctness issuee
* fix lint
* ass compile
* Fix build issue
* Fix runtime error
* Fix correctness issue
* Fix crash issue
* minor change
* Fix memory leak
* Fix review comments
* Finish allgather
* address comments
* load element to register first then store to remote address
* Finish allGather
* init
* Build connections
* allreduce_test works
* Bug fix
* Add CUDA flags
* Add packet copy (LL)
* Lint
* Set tmpPtr from constructors
* Lint
* Multiple blocks per peer
* Beautify
* Temporal ring reduce
* Ring reduce works correctly
* Overlapping
* Fix overlapping
* Improve vector sum
* figuring out how to use atomics
* working now
* wip
* Enhance LL AllReduce
* Support multiple blocks per peer
* Fix a ring reduce bug
* Fix a AllReduce kernel 2 bug
* Bug fix
* wip
* Make it compilable
* Lint
* Lint
* Minor changes
* Unit test to reproduce memory consistency bugs
* Unit test bug fixes
* Fixes
* Typo
* wip
* done with core
* wip
* wip
* compiles
* only the atomic is failing
* almost working
* all tests pass now
* clang-12
* More jailbreaks
* bug fix for common.cu
* adding stdint to concurrency.hpp
* Out-of-place for AllReduce kernel 2
* Optimize `sync()`
* Fix mp_unit_tests
* Init TestEngine with TestArgs
* Change common.cu into common.cc
* Cleanup common.hpp
* Lint
* fixes to the mscclpp-tests
* fixed common.cc
---------
Co-authored-by: Binyang Li <binyli@microsoft.com>
Co-authored-by: Saeed Maleki <saemal@microsoft.com>
|
2023-06-12 21:38:06 -07:00 |
|
Changho Hwang
|
43de015f3f
|
Add packet copy (LL) for AllReduce (#85)
|
2023-06-12 21:53:50 +08:00 |
|
Changho Hwang
|
5a4885ccbb
|
Misc updates (#95)
|
2023-06-12 13:53:43 +08:00 |
|
Changho Hwang
|
798631bd52
|
Update unit tests (#81)
|
2023-06-08 09:58:05 +00:00 |
|
Changho Hwang
|
0c14a67ad2
|
[mscclpp-test] Add AllReduce and AllToAll tests (#83)
|
2023-06-07 10:58:47 +00:00 |
|
Changho Hwang
|
9cee6c4a74
|
Cleanup old files and functions (#86)
|
2023-06-01 17:34:57 +08:00 |
|
Olli Saarikivi
|
457c422791
|
Remove alloc.h and beef up cuda_utils.hpp (#82)
|
2023-05-24 08:34:18 +00:00 |
|
Binyang2014
|
a3cf48cc5d
|
Rewrite mscclpp-test with cpp style API (#77)
- Rewrite mscclpp-test with cpp style API
- Add SM copy
- add new sendRecv test
|
2023-05-19 14:14:19 +08:00 |
|
Olli Saarikivi
|
4c0883bc91
|
Add a missing throw
|
2023-05-16 16:16:00 -07:00 |
|
Olli Saarikivi
|
4e4d1972e3
|
Cuda smart pointers
|
2023-05-16 16:16:00 -07:00 |
|
Olli Saarikivi
|
d58e698d51
|
Add headers to install and set default install dir
|
2023-05-12 21:23:01 +00:00 |
|
Binyang Li
|
785a973ace
|
refine exception
|
2023-05-11 08:25:25 +00:00 |
|
Olli Saarikivi
|
9f6c48cbf9
|
Format all files
|
2023-05-11 00:23:14 +00:00 |
|
Olli Saarikivi
|
beaf2aea39
|
Move public headers under include/
|
2023-05-10 20:46:49 +00:00 |
|