mirror of
https://github.com/microsoft/mscclpp.git
synced 2026-05-12 17:26:04 +00:00
117 lines
6.3 KiB
Markdown
117 lines
6.3 KiB
Markdown
# MSCCL++
|
|
|
|
GPU-driven computation & communication stack.
|
|
|
|
## Quick Start
|
|
|
|
### Preliminaries
|
|
|
|
- OS: tested over Ubuntu 18.04 and 20.04
|
|
- Libraries: CUDA >= 11.1.1, [libnuma](https://github.com/numactl/numactl)
|
|
- GPUs: A100 (TBU: H100)
|
|
- Azure SKUs: [ND_A100_v4](https://learn.microsoft.com/en-us/azure/virtual-machines/nda100-v4-series), [NDm_A100_v4](https://learn.microsoft.com/en-us/azure/virtual-machines/ndm-a100-v4-series) (TBD: [NC_A100_v4](https://learn.microsoft.com/en-us/azure/virtual-machines/nc-a100-v4-series))
|
|
|
|
|
|
### Compile Library
|
|
|
|
Run `make` in the top directory. To use MPI for test code, pass `MPI_HOME` (`/usr/local/mpi` by default). For example:
|
|
|
|
```
|
|
$ MPI_HOME=/usr/local/mpi make -j
|
|
```
|
|
|
|
If you do not want to use MPI, pass `USE_MPI_FOR_TESTS=0`.
|
|
|
|
```
|
|
# Do not use MPI
|
|
$ USE_MPI_FOR_TESTS=0 make -j
|
|
```
|
|
|
|
`make` will create a header file `build/include/mscclpp.h` and a shared library `build/lib/libmscclpp.so`.
|
|
|
|
### (Optional) Tests
|
|
|
|
For verification, one can try provided sample code `bootstrap_test` or `p2p_test`. First add the MSCCL++ library path to `LD_LIBRARY_PATH`.
|
|
|
|
```
|
|
$ export LD_LIBRARY_PATH=$PWD/build/lib:$LD_LIBRARY_PATH
|
|
```
|
|
|
|
Run tests using MPI:
|
|
|
|
```
|
|
$ mpirun -np 8 ./build/bin/tests/bootstrap_test 127.0.0.1:50000
|
|
$ mpirun -np 8 ./build/bin/tests/p2p_test 127.0.0.1:50000
|
|
```
|
|
|
|
If tests are compiled without MPI, pass a rank and the number of ranks as the following example. Usage of `p2p_test` is also the same as `bootstrap_test`.
|
|
|
|
```
|
|
# Terminal 1: Rank 0, #Ranks 2
|
|
$ ./build/bin/tests/bootstrap_test 127.0.0.1:50000 0 2
|
|
# Terminal 2: Rank 1, #Ranks 2
|
|
$ ./build/bin/tests/bootstrap_test 127.0.0.1:50000 1 2
|
|
```
|
|
|
|
## Performance
|
|
|
|
All results from NDv4. "xp-yn" means "x" total GPUs across "y" nodes.
|
|
|
|
**NOTE:** NCCL AllGather leverages Ring algorithm instead of all-pairs alike algorithm, which greatly reduces inter-node transmission, causing significant higher performance. MSCCL++ should do something similar in the future
|
|
|
|
### 8p-1n
|
|
**Latency (us)**
|
|
| Message Size | NCCL AllGather | NCCL AllToAll | MSCCL AllToAll LL | MSCCL AllToAll LL128 | MSCCL AllToAll Simple | MSCCL++ AllGather K0 | MSCCL++ AllGather K1 |
|
|
|:------------:|:--------------:|:-------------:|:-----------------:|:--------------------:|:---------------------:|:--------------------:|:--------------------:|
|
|
| 1K | 13.12 | 9.61 | **7.76** | 21.06 | 28.50 | 157.91 | 143.21 |
|
|
|
|
**BusBW (GB/s)**
|
|
| Message Size | NCCL AllGather | NCCL AllToAll | MSCCL AllToAll LL | MSCCL AllToAll LL128 | MSCCL AllToAll Simple | MSCCL++ AllGather K0 | MSCCL++ AllGather K1 |
|
|
|:------------:|:--------------:|:-------------:|:-----------------:|:--------------------:|:---------------------:|:--------------------:|:--------------------:|
|
|
| 1G | 218.27 | 220.09 | 217.05 | 216.98 | 217.15 | 93.69 | **255.06** |
|
|
|
|
### 2p-2n
|
|
**Latency (us)**
|
|
| Message Size | NCCL AllGather | NCCL AllToAll | MSCCL AllToAll LL | MSCCL AllToAll LL128 | MSCCL AllToAll Simple | MSCCL++ AllGather K0 | MSCCL++ AllGather K1 |
|
|
|:------------:|:--------------:|:-------------:|:-----------------:|:--------------------:|:---------------------:|:--------------------:|:--------------------:|
|
|
| 1K | 15.31 | 28.36 | 14.67 | 29.12 | 35.43 | 15.32 | **13.84** |
|
|
|
|
**BusBW (GB/s)**
|
|
| Message Size | NCCL AllGather | NCCL AllToAll | MSCCL AllToAll LL | MSCCL AllToAll LL128 | MSCCL AllToAll Simple | MSCCL++ AllGather K0 | MSCCL++ AllGather K1 |
|
|
|:------------:|:--------------:|:-------------:|:-----------------:|:--------------------:|:---------------------:|:--------------------:|:--------------------:|
|
|
| 1G | 15.69 | 16.22 | 13.94 | 13.83 | 14.10 | **23.26** | **23.29** |
|
|
|
|
### 16p-2n
|
|
**Latency (us)**
|
|
| Message Size | NCCL AllGather | NCCL AllToAll | MSCCL AllToAll LL | MSCCL AllToAll LL128 | MSCCL AllToAll Simple | MSCCL++ AllGather K0 | MSCCL++ AllGather K1 |
|
|
|:------------:|:--------------:|:-------------:|:-----------------:|:--------------------:|:---------------------:|:--------------------:|:--------------------:|
|
|
| 1K | 31.70 | 45.12 | **22.55** | 39.33 | 56.93 | 159.14 | 230.52 |
|
|
|
|
**BusBW (GB/s)**
|
|
| Message Size | NCCL AllGather | NCCL AllToAll | MSCCL AllToAll LL | MSCCL AllToAll LL128 | MSCCL AllToAll Simple | MSCCL++ AllGather K0 | MSCCL++ AllGather K1 |
|
|
|:------------:|:--------------:|:-------------:|:-----------------:|:--------------------:|:---------------------:|:--------------------:|:--------------------:|
|
|
| 1G | 174.28 | 38.30 | 40.17 | 40.18 | 40.23 | **44.08** | 9.31 |
|
|
|
|
|
|
## Contributing
|
|
|
|
This project welcomes contributions and suggestions. Most contributions require you to agree to a
|
|
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
|
|
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
|
|
|
|
When you submit a pull request, a CLA bot will automatically determine whether you need to provide
|
|
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
|
|
provided by the bot. You will only need to do this once across all repos using our CLA.
|
|
|
|
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
|
|
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
|
|
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
|
|
|
|
## Trademarks
|
|
|
|
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
|
|
trademarks or logos is subject to and must follow
|
|
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
|
|
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
|
|
Any use of third-party trademarks or logos are subject to those third-party's policies.
|