mscclpp

mirror of https://github.com/microsoft/mscclpp.git synced 2026-05-12 17:26:04 +00:00

Author	SHA1	Message	Date
Changho Hwang	67f9933ba1	fix data direct	2026-04-01 10:20:43 +00:00
copilot-swe-agent[bot]	bff76d5b85	Fix TearDown() handling and replace assert() in perf tests Address review comments: 1. Ensure TearDown() is always called if SetUp() succeeds, even when TestBody() throws. This prevents resource leaks and maintains MPI synchronization between tests. 2. Replace assert() in fifo_perf_tests.cu with proper return false on validation failure, ensuring consistent test failure reporting. Fixes: - test/framework.cc: Track SetUp success and call TearDown in finally-style - test/unit/fifo_perf_tests.cu: Replace assert with explicit check Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-03-18 19:44:11 +00:00
Changho Hwang	e40c72bd2b	license text update	2026-02-18 20:12:32 -08:00
Changho Hwang	b6ce0f2ede	simplify	2026-02-18 19:16:21 -08:00
copilot-swe-agent[bot]	e26f8abbd4	Address PR review comments 1. Add missing includes to fifo_perf_tests.cu - Add #include <cassert> - Add #include <unordered_map> 2. Fix license header capitalization (4 files) - "license." → "License." in test/framework.{hpp,cc} - "license." → "License." in test/unit/{unit_tests_main.cc,fifo_perf_tests.cu} 3. Fix double MPI_Init issue - Check MPI_Initialized() before calling MPI_Init - Prevents double initialization when mp_unit_tests already inits MPI 4. Fix coverage flags for CUDA compilation - Use generator expressions to apply --coverage only to C++ language - Prevents breaking CUDA compilation with host-only flags 5. Fix environment memory leak - Delete environment objects after TearDown() - Clear environments_ vector 6. Implement proper GTEST_SKIP handling - Create SkipException class - Handle skipped tests separately from failures - Report skipped test count 7. Implement GTest-style filter pattern matching - Support wildcards (* and ?) - Support negative patterns (-Pattern) - Support colon-separated patterns (Foo:Bar) - Compatible with existing CI usage like --gtest_filter=-Ib Verified builds successfully with Docker. Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 08:32:28 +00:00
copilot-swe-agent[bot]	50f6a24b69	Remove test/perf/ directory completely - Remove test/perf/ directory (fifo_test.cu, framework.{cc,hpp}, CMakeLists.txt) - Remove add_subdirectory(perf) from test/CMakeLists.txt - Performance tests now integrated into unit_tests as fifo_perf_tests.cu - Fix mp_unit_tests.cc to use framework functions without ::testing:: namespace - Fix bootstrap_tests.cc ErrorCode comparison to use ASSERT_TRUE - Fix switch_channel_tests.cu to not use streaming with ASSERT_EQ - Add missing #include <unistd.h> to executor_tests.cc All perf test functionality is now in unit_tests and can be filtered with --exclude-perf-tests flag. The standalone test/perf/ directory is no longer needed. Verified builds: - unit_tests: ✅ - mp_unit_tests: ✅ Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 03:16:00 +00:00
copilot-swe-agent[bot]	b59196b8a5	Integrate perf tests into unit_tests and add CI coverage step - Add unit_tests_main.cc with main() function for unit_tests executable - Create fifo_perf_tests.cu as PERF_TEST for unit_tests - Add fifo_perf_tests.cu to unit_tests sources - Fix errors_tests.cc to use ASSERT_TRUE for ErrorCode comparisons - Fix core_tests.cc to use ASSERT_TRUE for TransportFlags comparisons - Add Azure pipeline step for Debug build with coverage - Add step to run mp_unit_tests --exclude-perf-tests with coverage The perf tests are now part of unit_tests and can be filtered out for coverage reporting. CI now includes Debug build with coverage collection for non-performance tests. Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 02:35:46 +00:00
copilot-swe-agent[bot]	305d15717e	Remove PerfTestResult and reuse TestResult directly - Add nlohmann::ordered_json metrics field to TestResult struct - Add nlohmann/json.hpp include to test/framework.hpp - Link test_framework with nlohmann_json::nlohmann_json - Replace PerfTestResult with TestResult in test/perf/framework.cc - Move perf utility functions to utils namespace for consistency - Remove duplicate PerfTestResult struct definition This consolidates the two similar structs into one, reducing code duplication while maintaining all necessary fields for both unit tests (passed/failure_message) and performance tests (metrics). Verified build succeeds with Docker: docker run --rm -v $(pwd):/workspace -w /workspace \ ghcr.io/microsoft/mscclpp/mscclpp:base-dev-cuda12.4 bash -c \ "cd /workspace/build && make -j4 fifo_test" Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 02:21:10 +00:00
copilot-swe-agent[bot]	0eae34c53d	Fix test framework for building with Docker - Make MPI REQUIRED for test builds (clearer error messages) - Add project include directories to test_framework library - Fix core_tests.cc to use custom framework correctly - Fix mp_unit_tests.hpp to use mscclpp::test namespace - Add FAIL() macro with streaming support for test messages - Building tests now works in Docker environment with GPU bypass Tests can now be built using: docker run --rm -v $(pwd):/workspace -w /workspace \ ghcr.io/microsoft/mscclpp/mscclpp:base-dev-cuda12.4 bash -c \ "mkdir build && cd build && cmake -DMSCCLPP_BYPASS_GPU_CHECK=ON \ -DMSCCLPP_USE_CUDA=ON .. && make -j" Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 01:13:29 +00:00
copilot-swe-agent[bot]	e227fdc1ef	Convert mp_unit tests from gtest to framework.hpp - Modified test/mp_unit/mp_unit_tests.hpp to use ../framework.hpp instead of gtest/gtest.h - Enhanced test/framework.hpp with GTest-compatible APIs: - Added Environment base class for global test setup/teardown - Added TestInfo and UnitTest classes for test metadata access - Added GTEST_SKIP macro support via SkipHelper class - Added namespace alias 'testing' for compatibility - Added InitGoogleTest and AddGlobalTestEnvironment helper functions - Updated test/framework.cc with implementations for new classes - All mp_unit test files now use framework.hpp through mp_unit_tests.hpp - Formatting applied via lint.sh Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 00:21:04 +00:00
copilot-swe-agent[bot]	c881bc5e16	Replace gtest/gtest.h with framework.hpp in all unit tests Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 00:17:18 +00:00
Qinghua Zhou	620378b4fb	Fix cpplint error in main branch (#740 ) Fix the legacy cpplint error in main branch. --------- Co-authored-by: Qinghua Zhou <qinghuahzhou@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Binyang Li <binyli@microsoft.com>	2026-02-05 09:25:12 -08:00
Changho Hwang	ffafcaf6d6	IB stack enhancements & bug fixes (#673 ) * Always use `ibv_reg_dmabuf_mr` when DMABUF is supported * Do not check `nvidia-peermem` when unnecessary * More rigorous check on IB port availability * Fixed ibverbs wrappers * Fixed `IbPeerToPeerTest.SimpleAtomicAdd` test	2025-11-07 14:26:17 -08:00
Changho Hwang	ae56698d67	New semaphore constructors (#559 ) More intuitive interfaces for creating semaphores and channels. Also allows channel construction using third-party bootstrappers directly without overriding MSCCL++ Bootstrap.	2025-07-12 00:10:46 +00:00
Changho Hwang	20eca28942	Fix a FIFO correctness bug (#549 ) * Add a FIFO test code that reproduced a correctness issue * Fix the correctness issue by using pinned memory instead of cudaMemcpy --------- Co-authored-by: Binyang Li <binyli@microsoft.com>	2025-07-11 23:53:59 +00:00
Changho Hwang	22e8db4885	Support connection between local endpoints (#561 )	2025-07-02 13:02:44 -07:00
Changho Hwang	253a1ba1a9	Use a stream pool for `gpuCalloc()` (#509 ) Previous `gpuCalloc()` creates a new stream for each allocation, which messes the timeline up in profiler traces. Now `GpuStreamPool` allows reusing the temporal streams.	2025-06-04 10:07:20 -07:00
Changho Hwang	83356957bd	Improved documentation & minor interface revision (#541 )	2025-06-03 14:26:27 -07:00
Changho Hwang	de664ad200	Fix #514 (#521 ) * In cases when the same `tag` is used for receiving data from the same remote rank, #514 changed the behavior of `Communicator::connect` and `Communicator::recvMemory` to receive data in the order of `std::shared_future::get()` is called, instead of the original behvaior that receive data in the order of the method calls. Since the original behavior is more intuitive, we get that back. Now when `get()` is called on a future, the async function will first call `wait()` on the latest previously returned future. In a recursive manner, this will call `wait()` on all previous futures that are not yet ready. * Removed all deprecated API calls and replaced into the new ones.	2025-05-13 13:43:35 -07:00
Changho Hwang	d636093336	Asynchronous setup (#514 ) Cherry-picked a part of features from #167: now `Communicator::setup()` is unneeded. `Communicator::sendMemory()` conducts the task inline, and `Communicator::recvMemory()` and `Communicator::connect()` conducts the task asynchronously without explicit setup.	2025-05-08 22:01:51 +00:00
Binyang Li	e412804eab	Improve signal/wait performance and fix barrier issue (#499 ) Remove __assert_fail for release build. This will reduce the number of PTX instructions inside the loop. Also Trying to resolve this issue reported in #497. Reduce the number of PTX instructions from 8 to 6. 8 ranks signal/wait will reduce from 3.2us->2.8us on NDv5 Also NDEBUG flag is confused here, sometime it will not be set. Use customized flag for debug build. Here is current PTX: ``` ld.u64 %rd12, [%rd2+-24]; mov.u64 %rd13, %rd12; mov.u64 %rd11, %rd13; ld.acquire.sys.b64 %rd10,[%rd11]; setp.lt.u64 %p1, %rd10, %rd3; @%p1 bra $L__BB2_1; ``` If we change to `asm volatile("ld.global.acquire.sys.b64 %0, [%1];" : "=l"(flag) : "l"(flag_addr));` will reduce to 4 instructions. We can get 2.1 us for 8 ranks signal/wait ``` ld.u64 %rd9, [%rd1+-24]; ld.global.acquire.sys.b64 %rd8, [%rd9]; setp.lt.u64 %p1, %rd8, %rd2; @%p1 bra $L__BB2_1; ```	2025-04-16 14:22:10 -07:00
Changho Hwang	34945fb107	Add `GpuBuffer` class (#423 ) * Renamed and moved mem alloc functions into the `mscclpp::detail::` namespace (now `mscclpp::detail::gpuCalloc<T>()`) Deprecated constructor-calling mem alloc functions (`mscclpp::makeShared<T>()` and `mscclpp::makeUnique<T>()`) * Added a new `mscclpp::GpuBuffer<T>()` class that should be used in general for allocating communication buffers * Added a new `mscclpp.utils.GpuBuffer` Python class that inherits `cupy.ndarray` and allocates using `mscclpp::gpuMemAlloc` * Renamed `mscclpp::memcpyCuda<T>()` functions into `mscclpp::gpuMemcpy<T>()` for name consistency * A few fixes in NVLS memory allocation * Tackled minor compiler warnings	2025-01-07 18:40:01 -08:00
Changho Hwang	0c150e5166	Fix copyright messages (#367 )	2024-10-17 21:25:46 -07:00
Changho Hwang	f76eae4dca	Fix assert declaration & add a compile test (#303 )	2024-05-20 02:39:30 +00:00
Changho Hwang	1a7cb98e3a	v0.4.3 (#279 )	2024-03-27 11:53:09 -07:00
Changho Hwang	5ba6ce00c7	Fix bootstrapping mechanism (#278 ) Co-authored-by: Binyang Li <binyli@microsoft.com> Co-authored-by: Pashupati Kumar <74680231+pash-msft@users.noreply.github.com>	2024-03-27 10:24:24 +08:00
Changho Hwang	544ff0c21d	ROCm support (#213 ) Co-authored-by: Binyang Li <binyli@microsoft.com>	2023-11-24 16:41:56 +08:00
Olli Saarikivi	828be48b21	Add Context and Endpoint classes to enable non-Communicator use-cases (#166 ) This PR implements and closes #137. The new `Endpoint` and `Context` classes expose the connection establishing functionality from `Communicator`, which now is only responsible for tying together the bootstrapper with a context. The largest breaking change here is that `Communicator.connectOnSetup(...)` now returns the `Connection` wrapped inside a `NonblockingFuture`. This is because with the way `Context` is implemented a `Connection` is now fully initialized on construction. Some smaller breaking API changes from this change are that `RegisteredMemory` no longer has a `rank()` function (as there maybe no concept of rank), and similarly `Connection` has no `remoteRank()` and `tag()` functions. The latter are replaced by `remoteRankOf` and `tagOf` functions in `Communicator`. A new `EndpointConfig` class is introduced to avoid duplication of the IB configuration parameters in the APIs of `Context` and `Communicator`. The usual usage pattern of just passing in a `Transport` still works due to an implicit conversion into `EndpointConfig`. Miscellaneous changes: -Cleans up how the PIMPL pattern is applied by making both the `Impl` struct and the `pimpl_` pointers private for all relevant classes in the core API. -Enables ctest to be run from the build root directory.	2023-09-06 13:10:04 +08:00
Saeed Maleki	8d1b984bed	Change device handle interfaces & others (#142 ) * Changed device handle interfaces * Changed proxy service interfaces * Move device code into separate files * Fixed FIFO polling issues * Add configuration arguments in several interface functions --------- Co-authored-by: Changho Hwang <changhohwang@microsoft.com> Co-authored-by: Binyang Li <binyli@microsoft.com> Co-authored-by: root <root@a100-saemal0.qxveptpukjsuthqvv514inp03c.gx.internal.cloudapp.net>	2023-08-16 20:00:56 +08:00
Saeed Maleki	e7d5e652df	Python bindings (#125 ) Co-authored-by: Olli Saarikivi <olsaarik@microsoft.com> Co-authored-by: Changho Hwang <changhohwang@microsoft.com> Co-authored-by: Binyang Li <binyli@microsoft.com>	2023-07-19 15:35:54 +08:00
Changho Hwang	6ec585f3d8	Packet copy for IB (#109 ) * Extend channels to support LL with IB * Rename classes and interfaces	2023-06-28 10:39:31 -07:00
Saeed Maleki	df2f0c14ab	bootstrap now takes interface (#113 ) This PR fixes the issue regarding taking the interface as an input.	2023-06-29 00:16:06 +08:00
Changho Hwang	21eed722af	Add license comments (#106 )	2023-06-25 12:40:12 +08:00
Changho Hwang	60b3dd5a61	Bug fixes & resolve warnings (#107 ) * Fix a bug in host hashing * Fix a bug in `HostEpoch::wait()` * Remove misc warnings	2023-06-16 09:31:23 +00:00
Changho Hwang	c4a5958dfc	Fix hanging bootstrap issues (#100 ) * Renew socket interfaces and error handling into C++ style * Fix bootstrap hanging bugs * Misc code cleanup --------- Co-authored-by: Binyang Li <binyli@microsoft.com> Co-authored-by: Saeed Maleki <saemal@microsoft.com>	2023-06-15 11:29:49 +08:00
Changho Hwang	798631bd52	Update unit tests (#81 )	2023-06-08 09:58:05 +00:00
Changho Hwang	9cee6c4a74	Cleanup old files and functions (#86 )	2023-06-01 17:34:57 +08:00
Olli Saarikivi	457c422791	Remove alloc.h and beef up cuda_utils.hpp (#82 )	2023-05-24 08:34:18 +00:00
Olli Saarikivi	4e4d1972e3	Cuda smart pointers	2023-05-16 16:16:00 -07:00
Olli Saarikivi	beaf2aea39	Move public headers under include/	2023-05-10 20:46:49 +00:00
Olli Saarikivi	f4ecae7c96	Rename tests/ to test/	2023-05-10 18:49:02 +00:00

41 Commits