- Add unit_tests_main.cc with main() function for unit_tests executable
- Create fifo_perf_tests.cu as PERF_TEST for unit_tests
- Add fifo_perf_tests.cu to unit_tests sources
- Fix errors_tests.cc to use ASSERT_TRUE for ErrorCode comparisons
- Fix core_tests.cc to use ASSERT_TRUE for TransportFlags comparisons
- Add Azure pipeline step for Debug build with coverage
- Add step to run mp_unit_tests --exclude-perf-tests with coverage
The perf tests are now part of unit_tests and can be filtered out
for coverage reporting. CI now includes Debug build with coverage
collection for non-performance tests.
Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>
- Add nlohmann::ordered_json metrics field to TestResult struct
- Add nlohmann/json.hpp include to test/framework.hpp
- Link test_framework with nlohmann_json::nlohmann_json
- Replace PerfTestResult with TestResult in test/perf/framework.cc
- Move perf utility functions to utils namespace for consistency
- Remove duplicate PerfTestResult struct definition
This consolidates the two similar structs into one, reducing code
duplication while maintaining all necessary fields for both unit
tests (passed/failure_message) and performance tests (metrics).
Verified build succeeds with Docker:
docker run --rm -v $(pwd):/workspace -w /workspace \
ghcr.io/microsoft/mscclpp/mscclpp:base-dev-cuda12.4 bash -c \
"cd /workspace/build && make -j4 fifo_test"
Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>
- Make MPI REQUIRED for test builds (clearer error messages)
- Add project include directories to test_framework library
- Fix core_tests.cc to use custom framework correctly
- Fix mp_unit_tests.hpp to use mscclpp::test namespace
- Add FAIL() macro with streaming support for test messages
- Building tests now works in Docker environment with GPU bypass
Tests can now be built using:
docker run --rm -v $(pwd):/workspace -w /workspace \
ghcr.io/microsoft/mscclpp/mscclpp:base-dev-cuda12.4 bash -c \
"mkdir build && cd build && cmake -DMSCCLPP_BYPASS_GPU_CHECK=ON \
-DMSCCLPP_USE_CUDA=ON .. && make -j"
Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>
- Modified test/mp_unit/mp_unit_tests.hpp to use ../framework.hpp instead of gtest/gtest.h
- Enhanced test/framework.hpp with GTest-compatible APIs:
- Added Environment base class for global test setup/teardown
- Added TestInfo and UnitTest classes for test metadata access
- Added GTEST_SKIP macro support via SkipHelper class
- Added namespace alias 'testing' for compatibility
- Added InitGoogleTest and AddGlobalTestEnvironment helper functions
- Updated test/framework.cc with implementations for new classes
- All mp_unit test files now use framework.hpp through mp_unit_tests.hpp
- Formatting applied via lint.sh
Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>
More intuitive interfaces for creating semaphores and channels. Also
allows channel construction using third-party bootstrappers directly
without overriding MSCCL++ Bootstrap.
* In cases when the same `tag` is used for receiving data from the same
remote rank, #514 changed the behavior of `Communicator::connect` and
`Communicator::recvMemory` to receive data in the order of
`std::shared_future::get()` is called, instead of the original behvaior
that receive data in the order of the method calls. Since the original
behavior is more intuitive, we get that back. Now when `get()` is called
on a future, the async function will first call `wait()` on the latest
previously returned future. In a recursive manner, this will call
`wait()` on all previous futures that are not yet ready.
* Removed all deprecated API calls and replaced into the new ones.
Cherry-picked a part of features from #167: now `Communicator::setup()`
is unneeded. `Communicator::sendMemory()` conducts the task inline, and
`Communicator::recvMemory()` and `Communicator::connect()` conducts the
task asynchronously without explicit setup.
This PR implements and closes#137. The new `Endpoint` and `Context`
classes expose the connection establishing functionality from
`Communicator`, which now is only responsible for tying together the
bootstrapper with a context.
The largest breaking change here is that
`Communicator.connectOnSetup(...)` now returns the `Connection` wrapped
inside a `NonblockingFuture`. This is because with the way `Context` is
implemented a `Connection` is now fully initialized on construction.
Some smaller breaking API changes from this change are that
`RegisteredMemory` no longer has a `rank()` function (as there maybe no
concept of rank), and similarly `Connection` has no `remoteRank()` and
`tag()` functions. The latter are replaced by `remoteRankOf` and `tagOf`
functions in `Communicator`.
A new `EndpointConfig` class is introduced to avoid duplication of the
IB configuration parameters in the APIs of `Context` and `Communicator`.
The usual usage pattern of just passing in a `Transport` still works due
to an implicit conversion into `EndpointConfig`.
Miscellaneous changes:
-Cleans up how the PIMPL pattern is applied by making both the `Impl`
struct and the `pimpl_` pointers private for all relevant classes in the
core API.
-Enables ctest to be run from the build root directory.