41 Commits

Author SHA1 Message Date
Changho Hwang
67f9933ba1 fix data direct 2026-04-01 10:20:43 +00:00
copilot-swe-agent[bot]
bff76d5b85 Fix TearDown() handling and replace assert() in perf tests
Address review comments:
1. Ensure TearDown() is always called if SetUp() succeeds, even when
   TestBody() throws. This prevents resource leaks and maintains MPI
   synchronization between tests.
2. Replace assert() in fifo_perf_tests.cu with proper return false
   on validation failure, ensuring consistent test failure reporting.

Fixes:
- test/framework.cc: Track SetUp success and call TearDown in finally-style
- test/unit/fifo_perf_tests.cu: Replace assert with explicit check

Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>
2026-03-18 19:44:11 +00:00
Changho Hwang
e40c72bd2b license text update 2026-02-18 20:12:32 -08:00
Changho Hwang
b6ce0f2ede simplify 2026-02-18 19:16:21 -08:00
copilot-swe-agent[bot]
e26f8abbd4 Address PR review comments
1. Add missing includes to fifo_perf_tests.cu
   - Add #include <cassert>
   - Add #include <unordered_map>

2. Fix license header capitalization (4 files)
   - "license." → "License." in test/framework.{hpp,cc}
   - "license." → "License." in test/unit/{unit_tests_main.cc,fifo_perf_tests.cu}

3. Fix double MPI_Init issue
   - Check MPI_Initialized() before calling MPI_Init
   - Prevents double initialization when mp_unit_tests already inits MPI

4. Fix coverage flags for CUDA compilation
   - Use generator expressions to apply --coverage only to C++ language
   - Prevents breaking CUDA compilation with host-only flags

5. Fix environment memory leak
   - Delete environment objects after TearDown()
   - Clear environments_ vector

6. Implement proper GTEST_SKIP handling
   - Create SkipException class
   - Handle skipped tests separately from failures
   - Report skipped test count

7. Implement GTest-style filter pattern matching
   - Support wildcards (* and ?)
   - Support negative patterns (-Pattern)
   - Support colon-separated patterns (Foo:Bar)
   - Compatible with existing CI usage like --gtest_filter=-*Ib*

Verified builds successfully with Docker.

Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>
2026-02-11 08:32:28 +00:00
copilot-swe-agent[bot]
50f6a24b69 Remove test/perf/ directory completely
- Remove test/perf/ directory (fifo_test.cu, framework.{cc,hpp}, CMakeLists.txt)
- Remove add_subdirectory(perf) from test/CMakeLists.txt
- Performance tests now integrated into unit_tests as fifo_perf_tests.cu
- Fix mp_unit_tests.cc to use framework functions without ::testing:: namespace
- Fix bootstrap_tests.cc ErrorCode comparison to use ASSERT_TRUE
- Fix switch_channel_tests.cu to not use streaming with ASSERT_EQ
- Add missing #include <unistd.h> to executor_tests.cc

All perf test functionality is now in unit_tests and can be filtered
with --exclude-perf-tests flag. The standalone test/perf/ directory
is no longer needed.

Verified builds:
- unit_tests: 
- mp_unit_tests: 

Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>
2026-02-11 03:16:00 +00:00
copilot-swe-agent[bot]
b59196b8a5 Integrate perf tests into unit_tests and add CI coverage step
- Add unit_tests_main.cc with main() function for unit_tests executable
- Create fifo_perf_tests.cu as PERF_TEST for unit_tests
- Add fifo_perf_tests.cu to unit_tests sources
- Fix errors_tests.cc to use ASSERT_TRUE for ErrorCode comparisons
- Fix core_tests.cc to use ASSERT_TRUE for TransportFlags comparisons
- Add Azure pipeline step for Debug build with coverage
- Add step to run mp_unit_tests --exclude-perf-tests with coverage

The perf tests are now part of unit_tests and can be filtered out
for coverage reporting. CI now includes Debug build with coverage
collection for non-performance tests.

Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>
2026-02-11 02:35:46 +00:00
copilot-swe-agent[bot]
305d15717e Remove PerfTestResult and reuse TestResult directly
- Add nlohmann::ordered_json metrics field to TestResult struct
- Add nlohmann/json.hpp include to test/framework.hpp
- Link test_framework with nlohmann_json::nlohmann_json
- Replace PerfTestResult with TestResult in test/perf/framework.cc
- Move perf utility functions to utils namespace for consistency
- Remove duplicate PerfTestResult struct definition

This consolidates the two similar structs into one, reducing code
duplication while maintaining all necessary fields for both unit
tests (passed/failure_message) and performance tests (metrics).

Verified build succeeds with Docker:
docker run --rm -v $(pwd):/workspace -w /workspace \
  ghcr.io/microsoft/mscclpp/mscclpp:base-dev-cuda12.4 bash -c \
  "cd /workspace/build && make -j4 fifo_test"

Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>
2026-02-11 02:21:10 +00:00
copilot-swe-agent[bot]
0eae34c53d Fix test framework for building with Docker
- Make MPI REQUIRED for test builds (clearer error messages)
- Add project include directories to test_framework library
- Fix core_tests.cc to use custom framework correctly
- Fix mp_unit_tests.hpp to use mscclpp::test namespace
- Add FAIL() macro with streaming support for test messages
- Building tests now works in Docker environment with GPU bypass

Tests can now be built using:
docker run --rm -v $(pwd):/workspace -w /workspace \
  ghcr.io/microsoft/mscclpp/mscclpp:base-dev-cuda12.4 bash -c \
  "mkdir build && cd build && cmake -DMSCCLPP_BYPASS_GPU_CHECK=ON \
   -DMSCCLPP_USE_CUDA=ON .. && make -j"

Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>
2026-02-11 01:13:29 +00:00
copilot-swe-agent[bot]
e227fdc1ef Convert mp_unit tests from gtest to framework.hpp
- Modified test/mp_unit/mp_unit_tests.hpp to use ../framework.hpp instead of gtest/gtest.h
- Enhanced test/framework.hpp with GTest-compatible APIs:
  - Added Environment base class for global test setup/teardown
  - Added TestInfo and UnitTest classes for test metadata access
  - Added GTEST_SKIP macro support via SkipHelper class
  - Added namespace alias 'testing' for compatibility
  - Added InitGoogleTest and AddGlobalTestEnvironment helper functions
- Updated test/framework.cc with implementations for new classes
- All mp_unit test files now use framework.hpp through mp_unit_tests.hpp
- Formatting applied via lint.sh

Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>
2026-02-11 00:21:04 +00:00
copilot-swe-agent[bot]
c881bc5e16 Replace gtest/gtest.h with framework.hpp in all unit tests
Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>
2026-02-11 00:17:18 +00:00
Qinghua Zhou
620378b4fb Fix cpplint error in main branch (#740)
Fix the legacy cpplint error in main branch.

---------

Co-authored-by: Qinghua Zhou <qinghuahzhou@microsoft.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Binyang Li <binyli@microsoft.com>
2026-02-05 09:25:12 -08:00
Changho Hwang
ffafcaf6d6 IB stack enhancements & bug fixes (#673)
* Always use `ibv_reg_dmabuf_mr` when DMABUF is supported
* Do not check `nvidia-peermem` when unnecessary
* More rigorous check on IB port availability
* Fixed ibverbs wrappers
* Fixed `IbPeerToPeerTest.SimpleAtomicAdd` test
2025-11-07 14:26:17 -08:00
Changho Hwang
ae56698d67 New semaphore constructors (#559)
More intuitive interfaces for creating semaphores and channels. Also
allows channel construction using third-party bootstrappers directly
without overriding MSCCL++ Bootstrap.
2025-07-12 00:10:46 +00:00
Changho Hwang
20eca28942 Fix a FIFO correctness bug (#549)
* Add a FIFO test code that reproduced a correctness issue
* Fix the correctness issue by using pinned memory instead of cudaMemcpy

---------

Co-authored-by: Binyang Li <binyli@microsoft.com>
2025-07-11 23:53:59 +00:00
Changho Hwang
22e8db4885 Support connection between local endpoints (#561) 2025-07-02 13:02:44 -07:00
Changho Hwang
253a1ba1a9 Use a stream pool for gpuCalloc*() (#509)
Previous `gpuCalloc*()` creates a new stream for each allocation, which
messes the timeline up in profiler traces. Now `GpuStreamPool` allows
reusing the temporal streams.
2025-06-04 10:07:20 -07:00
Changho Hwang
83356957bd Improved documentation & minor interface revision (#541) 2025-06-03 14:26:27 -07:00
Changho Hwang
de664ad200 Fix #514 (#521)
* In cases when the same `tag` is used for receiving data from the same
remote rank, #514 changed the behavior of `Communicator::connect` and
`Communicator::recvMemory` to receive data in the order of
`std::shared_future::get()` is called, instead of the original behvaior
that receive data in the order of the method calls. Since the original
behavior is more intuitive, we get that back. Now when `get()` is called
on a future, the async function will first call `wait()` on the latest
previously returned future. In a recursive manner, this will call
`wait()` on all previous futures that are not yet ready.
* Removed all deprecated API calls and replaced into the new ones.
2025-05-13 13:43:35 -07:00
Changho Hwang
d636093336 Asynchronous setup (#514)
Cherry-picked a part of features from #167: now `Communicator::setup()`
is unneeded. `Communicator::sendMemory()` conducts the task inline, and
`Communicator::recvMemory()` and `Communicator::connect()` conducts the
task asynchronously without explicit setup.
2025-05-08 22:01:51 +00:00
Binyang Li
e412804eab Improve signal/wait performance and fix barrier issue (#499)
Remove __assert_fail for release build. This will reduce the number of
PTX instructions inside the loop. Also Trying to resolve this issue
reported in #497. Reduce the number of PTX instructions from 8 to 6.
8 ranks signal/wait will reduce from 3.2us->2.8us on NDv5
Also NDEBUG flag is confused here, sometime it will not be set. Use
customized flag for debug build.

Here is current PTX:
```
      ld.u64  %rd12, [%rd2+-24];
      mov.u64         %rd13, %rd12;
      mov.u64         %rd11, %rd13;
      ld.acquire.sys.b64 %rd10,[%rd11];
      setp.lt.u64     %p1, %rd10, %rd3;
      @%p1 bra        $L__BB2_1;
```

If we change to `asm volatile("ld.global.acquire.sys.b64 %0, [%1];" :
"=l"(flag) : "l"(flag_addr));` will reduce to 4 instructions. We can get
2.1 us for 8 ranks signal/wait
```
        ld.u64  %rd9, [%rd1+-24];
        ld.global.acquire.sys.b64 %rd8, [%rd9];
        setp.lt.u64     %p1, %rd8, %rd2;
        @%p1 bra        $L__BB2_1;
```
2025-04-16 14:22:10 -07:00
Changho Hwang
34945fb107 Add GpuBuffer class (#423)
* Renamed and moved mem alloc functions into the `mscclpp::detail::`
namespace (now `mscclpp::detail::gpuCalloc*<T>()`)
* Deprecated constructor-calling mem alloc functions
(`mscclpp::makeShared*<T>()` and `mscclpp::makeUnique*<T>()`)
* Added a new `mscclpp::GpuBuffer<T>()` class that should be used in
general for allocating communication buffers
* Added a new `mscclpp.utils.GpuBuffer` Python class that inherits
`cupy.ndarray` and allocates using `mscclpp::gpuMemAlloc`
* Renamed `mscclpp::memcpyCuda*<T>()` functions into
`mscclpp::gpuMemcpy*<T>()` for name consistency
* A few fixes in NVLS memory allocation
* Tackled minor compiler warnings
2025-01-07 18:40:01 -08:00
Changho Hwang
0c150e5166 Fix copyright messages (#367) 2024-10-17 21:25:46 -07:00
Changho Hwang
f76eae4dca Fix assert declaration & add a compile test (#303) 2024-05-20 02:39:30 +00:00
Changho Hwang
1a7cb98e3a v0.4.3 (#279) 2024-03-27 11:53:09 -07:00
Changho Hwang
5ba6ce00c7 Fix bootstrapping mechanism (#278)
Co-authored-by: Binyang Li <binyli@microsoft.com>
Co-authored-by: Pashupati Kumar <74680231+pash-msft@users.noreply.github.com>
2024-03-27 10:24:24 +08:00
Changho Hwang
544ff0c21d ROCm support (#213)
Co-authored-by: Binyang Li <binyli@microsoft.com>
2023-11-24 16:41:56 +08:00
Olli Saarikivi
828be48b21 Add Context and Endpoint classes to enable non-Communicator use-cases (#166)
This PR implements and closes #137. The new `Endpoint` and `Context`
classes expose the connection establishing functionality from
`Communicator`, which now is only responsible for tying together the
bootstrapper with a context.

The largest breaking change here is that
`Communicator.connectOnSetup(...)` now returns the `Connection` wrapped
inside a `NonblockingFuture`. This is because with the way `Context` is
implemented a `Connection` is now fully initialized on construction.

Some smaller breaking API changes from this change are that
`RegisteredMemory` no longer has a `rank()` function (as there maybe no
concept of rank), and similarly `Connection` has no `remoteRank()` and
`tag()` functions. The latter are replaced by `remoteRankOf` and `tagOf`
functions in `Communicator`.

A new `EndpointConfig` class is introduced to avoid duplication of the
IB configuration parameters in the APIs of `Context` and `Communicator`.
The usual usage pattern of just passing in a `Transport` still works due
to an implicit conversion into `EndpointConfig`.

Miscellaneous changes:
-Cleans up how the PIMPL pattern is applied by making both the `Impl`
struct and the `pimpl_` pointers private for all relevant classes in the
core API.
-Enables ctest to be run from the build root directory.
2023-09-06 13:10:04 +08:00
Saeed Maleki
8d1b984bed Change device handle interfaces & others (#142)
* Changed device handle interfaces
* Changed proxy service interfaces
* Move device code into separate files
* Fixed FIFO polling issues
* Add configuration arguments in several interface functions

---------

Co-authored-by: Changho Hwang <changhohwang@microsoft.com>
Co-authored-by: Binyang Li <binyli@microsoft.com>
Co-authored-by: root <root@a100-saemal0.qxveptpukjsuthqvv514inp03c.gx.internal.cloudapp.net>
2023-08-16 20:00:56 +08:00
Saeed Maleki
e7d5e652df Python bindings (#125)
Co-authored-by: Olli Saarikivi <olsaarik@microsoft.com>
Co-authored-by: Changho Hwang <changhohwang@microsoft.com>
Co-authored-by: Binyang Li <binyli@microsoft.com>
2023-07-19 15:35:54 +08:00
Changho Hwang
6ec585f3d8 Packet copy for IB (#109)
* Extend channels to support LL with IB
* Rename classes and interfaces
2023-06-28 10:39:31 -07:00
Saeed Maleki
df2f0c14ab bootstrap now takes interface (#113)
This PR fixes the issue regarding taking the interface as an input.
2023-06-29 00:16:06 +08:00
Changho Hwang
21eed722af Add license comments (#106) 2023-06-25 12:40:12 +08:00
Changho Hwang
60b3dd5a61 Bug fixes & resolve warnings (#107)
* Fix a bug in host hashing
* Fix a bug in `HostEpoch::wait()`
* Remove misc warnings
2023-06-16 09:31:23 +00:00
Changho Hwang
c4a5958dfc Fix hanging bootstrap issues (#100)
* Renew socket interfaces and error handling into C++ style
* Fix bootstrap hanging bugs
* Misc code cleanup

---------

Co-authored-by: Binyang Li <binyli@microsoft.com>
Co-authored-by: Saeed Maleki <saemal@microsoft.com>
2023-06-15 11:29:49 +08:00
Changho Hwang
798631bd52 Update unit tests (#81) 2023-06-08 09:58:05 +00:00
Changho Hwang
9cee6c4a74 Cleanup old files and functions (#86) 2023-06-01 17:34:57 +08:00
Olli Saarikivi
457c422791 Remove alloc.h and beef up cuda_utils.hpp (#82) 2023-05-24 08:34:18 +00:00
Olli Saarikivi
4e4d1972e3 Cuda smart pointers 2023-05-16 16:16:00 -07:00
Olli Saarikivi
beaf2aea39 Move public headers under include/ 2023-05-10 20:46:49 +00:00
Olli Saarikivi
f4ecae7c96 Rename tests/ to test/ 2023-05-10 18:49:02 +00:00