mscclpp

mirror of https://github.com/microsoft/mscclpp.git synced 2026-05-12 01:10:22 +00:00

Author	SHA1	Message	Date
Binyang Li	eeea00b298	Support python wheel build (#787 ) ## Support Python wheel build This PR modernizes the Python packaging for MSCCL++ by defining dependencies and optional extras in `pyproject.toml`, enabling proper wheel builds with `pip install ".[cuda12]"`. ### Changes `pyproject.toml` - Add `dependencies` (numpy, blake3, pybind11, sortedcontainers) - Add `optional-dependencies` for platform-specific CuPy (`cuda11`, `cuda12`, `cuda13`, `rocm6`), `benchmark`, and `test` extras - Bump minimum Python version from 3.8 to 3.10 `test/deploy/setup.sh` - Use `pip install ".[<platform>,benchmark,test]"` instead of separate `pip install -r requirements_.txt` + `pip install .` steps - Add missing CUDA 13 case `docs/quickstart.md`* - Update install instructions to use extras (e.g., `pip install ".[cuda12]"`) - Document all available extras and clarify that `rocm6` builds CuPy from source - Update Python version references to 3.10 `python/csrc/CMakeLists.txt`, `python/test/CMakeLists.txt` - Update `find_package(Python)` from 3.8 to 3.10 ### Notes - The `requirements_*.txt` files are kept for Docker base image builds where only dependencies (not the project itself) should be installed. - CuPy is intentionally not in base dependencies — users must specify a platform extra to get the correct pre-built wheel (or source build for ROCm). --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 21:24:45 -07:00
Changho Hwang	d63f9403c0	IB `host-no-atomic`: GDRCopy + mlx5dv Data Direct for memory-consistent low-latency signaling (#753 ) Major enhancements to the IB signal forwarding mechanisms (`host-no-atomic` mode), primarily adding support for GDRCopy and MLX5 Direct Verbs, and refactoring the signal forwarding path for IB HostNoAtomic mode. The changes fix memory consistency issues and reduce signaling latency. - GDRCopy and MLX5 Direct Verbs MR integration - Signal forwarding path redesign - Semaphore and connection API updates - Environment (`MSCCLPP_FORCE_DISABLE_GDR`) and documentation updates	2026-04-09 09:24:30 +00:00
Copilot	93f6eeaa6b	Remove GTest dependency, add code coverage, and refactor unit tests and CI pipelines (#744 ) - Removes the GTest dependency, replacing it with a minimal custom framework (`test/framework.`) that covers only what the tests actually use — a unified `TEST()` macro with SFINAE-based fixture auto-detection, `EXPECT_`/`ASSERT_*` assertions, environments, and setup/teardown. - `--exclude-perf-tests` flag and substring-based negative filtering - `MSCCLPP_ENABLE_COVERAGE` CMake option with gcov/lcov; CI uploads to Codecov - Merges standalone `test/perf/` into main test targets - Refactors Azure pipelines to reduce redundancies & make more readable --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Changho Hwang <changhohwang@microsoft.com>	2026-03-24 23:34:38 -04:00
Binyang Li	e21513791a	Address comments for PR #692 (#733 ) Rename nanobind-exposed C++ types to Cpp* Replace MSCCLPP_EXECUTION_PLAN_DIR / MSCCLPP_NATIVE_CACHE_DIR with MSCCLPP_CACHE_DIR across C++ and Python.	2026-02-03 10:13:20 -08:00
Binyang Li	a707273701	Torch integration (#692 ) Reorganize current native algorithm implementation and DSL algorithm implementation. Provide unified API for DSL algo and native algo and provide interface to tune the algo Provide interface for pytorch integration with native API and DSL --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-01-21 20:32:24 -08:00
Changho Hwang	9994f53cea	Fixes for no-IB systems (#667 ) * Add a compile flag `MSCCLPP_USE_IB` that explicitly specifies IB on/off * Fix `nvidia-peermem` check; no need for DMABUF supported systems * Fix `mp_unit_tests` to skip all IB tests when built with `-DMSCCLPP_USE_IB=OFF`	2025-10-29 10:03:02 -07:00
Changho Hwang	a48421872e	Fix docs (#656 ) * Fix Python doc generation * Remove `ChannelTrigger` and fix `ProxyTrigger` * Fixed package versions for consistency	2025-10-23 00:34:53 +00:00
Binyang Li	ddca185add	Address corner case when generating version file (#641 ) Address corner case for version file generation --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: github-actions <github-actions@github.com>	2025-10-07 14:32:33 -07:00
Qinghua Zhou	16a96ea77b	Support detailed version tracking that captures git repository information (#639 ) #### Version Format The package version includes the git commit hash directly in the version string for development builds: - Release version: `0.7.0` - Development version: `0.7.0.dev36+g6e2360d69` (includes short commit hash) - Development with uncommitted changes: `0.7.0.dev36+g6e2360d69.dirty` #### Checking Version Information After installation, you can check the version information in several ways: From Python: ```python import mscclpp # Access individual attributes print(f"Version: {mscclpp.__version__}") # Full version with commit Version: 0.7.0.dev36+g6e2360d69 # Get as dictionary mscclpp.version() {'version': '0.7.0.dev46+gb0d27c58f', 'base_version': '0.7.0', 'git_commit': 'b0d27c58f'} ``` #### Version Information Details The version tracking captures: - Package Version (`mscclpp.__version__`): Full version string including git commit (e.g., `0.7.0.dev36+g6e2360d69`) This information is embedded during the package build process and remains accessible even after distribution, making it easier to debug issues and ensure reproducibility. --------- Co-authored-by: Binyang Li <binyli@microsoft.com>	2025-09-30 09:00:33 -07:00
Binyang Li	ba4c4aaeb8	Integrate MSCCL++ with torch workload (#626 ) Integrate MSCCL++ with torch Introduce `NCCL audit shim library`, use can use following commands to launch torch library. Also avoid break build pipeline in the CPU machine ```bash export LD_AUDIT=$MSCCLPP_INSTALL_DIR/libmscclpp_audit_nccl.so export LD_LIBRARY_PATH=$MSCCLPP_INSTALL_DIR:$LD_LIBRARY_PATH torchrun --nnodes=1 --nproc_per_node=8 your_script.py ```	2025-09-09 13:28:32 -07:00
Changho Hwang	2eadbaf86f	python doc auto generation (#605 ) Add Python API references	2025-08-11 10:34:29 -07:00
Changho Hwang	9650e5c37e	Update documentation (#576 ) Documentation overhaul	2025-08-07 15:37:37 -07:00
Binyang Li	4136153a76	[Doc] mscclpp docs (#348 ) Generate docs for mescclpp. Setup github action to auto-deploy github-page doc link here: https://microsoft.github.io/mscclpp --------- Co-authored-by: Changho Hwang <changhohwang@microsoft.com> Co-authored-by: Caio Rocha <caiorocha@microsoft.com>	2024-10-18 06:08:31 +00:00
caiomcbr	b1b9d0626c	Support NCCL APIs (#319 ) Start supporting NCCL APIs with a few limitations. --------- Co-authored-by: Caio Rocha <caio.rocha@microsoft.com> Co-authored-by: Changho Hwang <changhohwang@microsoft.com>	2024-06-27 23:54:06 +00:00
Angelica Moreira	34f4d9d006	Update quickstart.md (#314 ) Updating the docker image name tag and the python benchmark path.	2024-06-19 22:26:13 +00:00
Changho Hwang	1a7cb98e3a	v0.4.3 (#279 )	2024-03-27 11:53:09 -07:00
Changho Hwang	cdaf3aea3d	New packet format & optimizations (#256 ) Co-authored-by: Binyang Li <binyli@microsoft.com>	2024-02-20 20:01:37 -08:00
Binyang Li	f1b2c9df12	Fix performance downgrade issue & update doc (#229 ) For push function, we only need to make sure the instruction `st.global` will be executed after the while loop. Since there is a Write-After-Read hazard for `trigger.fst` (Check `this->triggers[curFifoHead % size].fst != 0` first then write value to `triggers[curFifoHead % size]`), we can expect the compiler and hardware can handle this situation correctly. Remove the `release.sys` there. BTW, `st.global.release.sys.v2.u64` will cause perf regression issue. Previous we use `st.global.release.cta.v2.u64`, but seems not necessary.	2023-12-04 10:20:10 -08:00
Changho Hwang	351b95b926	Update documents (#225 ) Adding AMD supports on the docs	2023-11-24 17:00:18 +08:00
Changho Hwang	15f6dcca49	Update documentation (#217 ) Co-authored-by: Saeed Maleki <saemal@microsoft.com>	2023-11-22 12:58:04 -08:00
Changho Hwang	f68820436c	Explicit build dependency on `nvidia_peermem` (#201 )	2023-10-23 04:29:30 +00:00
Changho Hwang	3df18d20a3	Update install guidelines (#159 )	2023-08-30 10:40:40 -07:00
Changho Hwang	4114d65c60	Documents & minor updates (#119 ) Co-authored-by: Saeed Maleki <saemal@microsoft.com> Co-authored-by: Binyang Li <binyli@microsoft.com>	2023-07-07 17:35:05 +08:00
Changho Hwang	85e664c2f7	Update docs (#88 )	2023-06-05 13:13:10 +08:00

24 Commits