Major enhancements to the IB signal forwarding mechanisms (`host-no-atomic` mode), primarily adding support for GDRCopy and MLX5 Direct Verbs, and refactoring the signal forwarding path for IB HostNoAtomic mode. The changes fix memory consistency issues and reduce signaling latency. - GDRCopy and MLX5 Direct Verbs MR integration - Signal forwarding path redesign - Semaphore and connection API updates - Environment (`MSCCLPP_FORCE_DISABLE_GDR`) and documentation updates
2.5 KiB
Project Contribution Guidelines
This document outlines a few useful informations for contributing to this project.
C/C++ Headers Layout
This project has two C/C++ header directories: include/mscclpp/ and src/include/. Headers in include/mscclpp/ are public headers that define the public API of the project. Headers in src/include/ are internal headers used only within the project.
When adding new headers, place them in the appropriate directory based on their intended usage (public API vs. internal use). To prevent confusion, do not have duplicate names for headers in these two directories.
Symbols declared in public headers must be properly documented using Doxygen-style comments, except forward declarations and private class members. In a few cases, we may need to add declarations in public headers that are not intended for public use. In such cases, declare them under mscclpp::detail namespace, where we do not necessarily document every symbol.
License Header
A license header must be included at the top of each source code file in the project.
For Python source code:
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
For C/C++/CUDA source code:
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
Formatting
If you have modified any code in the project, run ./tools/lint.sh to automatically format the entire source code before finishing iterations. Note that this script formats only files that are tracked by git, so if you have added new files, make sure to git add them first.
Building and Testing
The following commands are commonly used for building and testing the project. See docs/quickstart.md for more detailed instructions.
For building libraries and tests:
mkdir -p build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j$(nproc)
cd ..
For testing after successful build:
# To run tests with two GPUs - two is enough for most tests
mpirun -np 2 ./build/bin/mp_unit_tests
# To run tests excluding IB-related ones (when IB is not available)
mpirun -np 2 ./build/bin/mp_unit_tests --filter=-*Ib*
For building a Python package:
python3 -m pip install -e .
For Python tests after building the package:
# Run tests with 8 GPUs - adjust the number as needed
mpirun -np 8 python3 -m pytest ./python/test/test_mscclpp.py -vx
For building documentation (see dependencies in docs/requirements.txt):
cd docs
doxygen
make html
cd ..