Commit Graph

48 Commits

Author SHA1 Message Date
Qinghua Zhou
f0441ee4ea Update document versioning for PR #724 (#735)
This PR fix the issue of generating docs when we take
https://github.com/microsoft/mscclpp/pull/724 into main branch.
Build docs for main branch separately.
Use HEAD request instead of GET to check if a page exist.
Filter out versions before v0.4.0 in generate_versions.py.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Binyang Li <binyli@microsoft.com>
2026-02-01 19:52:01 -08:00
Qinghua Zhou
cc797abc87 Revert "Support versioning for mscclpp document (#724)" (#734)
This PR reverts commit 69d3b7 to avoid the github page issue.
2026-01-23 16:42:54 -08:00
Qinghua Zhou
69d3b79ecd Support versioning for mscclpp document (#724)
Show all the versions of mscclpp document on the webpage
https://microsoft.github.io/mscclpp/
Add sphinx-multiversion to generate documents for different versions.
Add version selector on document webpage.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Binyang Li <binyli@microsoft.com>
2026-01-23 09:45:41 -08:00
Binyang Li
a707273701 Torch integration (#692)
Reorganize current native algorithm implementation and DSL algorithm
implementation.
Provide unified API for DSL algo and native algo and provide interface
to tune the algo
Provide interface for pytorch integration with native API and DSL

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>
2026-01-21 20:32:24 -08:00
Changho Hwang
7b18a42274 Add copilot-instructions.md (#602) 2025-12-22 22:15:40 -08:00
Binyang Li
9eb958183c upgrade codeql to v3 (#676) 2025-11-06 16:58:19 -08:00
Changho Hwang
f7d1fb4492 Exclude irrelevant files from workflow triggers (#663) 2025-10-23 15:52:19 -07:00
Changho Hwang
58996b5c51 Fix docs version (#659)
Fetch full history of the repo for accurate version info
2025-10-23 11:14:27 -07:00
Changho Hwang
a48421872e Fix docs (#656)
* Fix Python doc generation
* Remove `ChannelTrigger` and fix `ProxyTrigger`
* Fixed package versions for consistency
2025-10-23 00:34:53 +00:00
Binyang Li
3d94383696 Add MSCCLPP_GIT_COMMIT micro (#640)
- Add MSCCLPP_GIT_COMMIT micro
- Update docs
2025-10-06 15:57:28 -07:00
Binyang Li
be6a941fba New DSL implementation (#579)
The PR contains following changes:
Python side:
- Channel based DSL implementation: decouple channel with chunk.
- Users create channel explicitly, only need local_rank, remote_rank and
channel_type
- Adjust executor json file, add remote_buffer fields, different op can
use different channel and remote buffers combination.
- Reimplement operation fusion, data dependency check mechanism
- Add new op such as semaphore, pipeline 
- Clean code and enhance document
C++ side: 
- Support new execution file json format
- Support semaphore and pipeline operation
- code clean, support non-zero copy scenario

---------

Co-authored-by: Caio Rocha <caiorocha@microsoft.com>
Co-authored-by: Changho Hwang <changhohwang@microsoft.com>
2025-08-09 00:36:20 -07:00
Changho Hwang
5b84c8a3d1 Separate linters from cmake (#587) 2025-07-28 09:59:20 +08:00
Binyang Li
adc9ee5684 Export mscclpp GpuBuffer to dlpack format (#492)
For mscclpp, to use nvls we require the buffer is allocated by
mscclpp::GpuBuffer. Due to cupy doesn't support bfloat16 yet, we export
the raw buffer to dlpack format.
User can use this feature to create buffer with type supported by
pytorch
```python
buffer = RawGpuBuffer(1024 * 2) # 2 for bfloat16
dl_pack = buffer.to_dlpack(str(torch.bfloat16))
tensor = torch.utils.dlpack.from_dlpack(dl_pack)
```
2025-04-03 12:59:32 -07:00
Changho Hwang
def68ced64 Add CUDA 12.8 images (#488) 2025-03-29 00:31:26 +00:00
Binyang Li
7f3b088744 Add multi-nodes example & update doc (#455)
Documentation update:

*
[`docs/design/mscclpp-dsl.md`](diffhunk://#diff-02a69290fb3e02b8a069bf915fbf5266cfc2ac51c6e9ff8b5b19df51ed909b22L114-R114):
Updated the link to the examples folder to reflect the correct path.

New example script:

*
[`python/examples/allgather_allpairs_multinodes_packets.py`](diffhunk://#diff-ab42c16ecca0680d55b60b82a6913138c5fba4069b9c4493fbe8c72217fe54bcR1-R76):
Added a new example script demonstrating the allgather all-pairs
algorithm across multiple nodes using packet communication.

IR module improvements:

*
[`python/mscclpp/language/ir.py`](diffhunk://#diff-b025796b03fbbd9b2ca9aee2569547efa7a56101743bc4aa05661be0b52aeec9L470-R472):
Refined the sorting criteria for GPU instance channels and thread block
channels to include the channel type, ensuring a more accurate order.
Debugging enhancements:

*
[`src/executor/executor.cc`](diffhunk://#diff-60f7806d111e5cc12ded06358b5d5b09b8521e3858f182d8be81ac05147c535dR439-R441):
Added a debug log to indicate the start of communication collective
execution with details about the execution plan and collective.
*
[`src/include/debug.h`](diffhunk://#diff-24e5fda55e3712277be4bb99b3c348294a77ebd3046bfe716b74bdb32cd203dfR89):
Introduced a new debug log subsystem identifier `MSCCLPP_EXECUTOR` for
logging executor-related information.
2025-01-31 17:52:15 -08:00
Binyang Li
af0bb86e07 Merge mscclpp-lang to mscclpp project (#442)
First step to merge msccl-tools into mscclpp repo. In this step will
move all msccl related code, pass the current tests and do some
necessary refactor.

Add `mscclpp.language` module
Add `_InstructionOptimizer` and `DagOptimizer` class to optimize the dag
Add `DagLower` to lower dag to intermediate representation 
Add documents for mscclpp.language
Remove msccl related code
2025-01-22 09:47:37 -08:00
Changho Hwang
2b54af7e27 Auto-update version numbers in CMakeLists.txt (#450) 2025-01-09 17:54:10 -08:00
Binyang Li
3d6bfed2cf Update version number (#433)
Co-authored-by: github-actions <github-actions@github.com>
2025-01-02 16:45:08 -08:00
Binyang Li
f18a440feb trigger ci for release branches (#426) 2024-12-21 00:05:13 +00:00
Changho Hwang
2127a3ba29 Improve CMake options (#376)
* Let all CMake option names start with `MSCCLPP_`
* Explain the `MSCCLPP_BUILD_PYTHON_BINDINGS` option in readme

---------

Co-authored-by: Binyang Li <binyli@microsoft.com>
2024-11-22 01:54:11 +00:00
Binyang Li
4136153a76 [Doc] mscclpp docs (#348)
Generate docs for mescclpp.
Setup github action to auto-deploy github-page
doc link here: https://microsoft.github.io/mscclpp

---------

Co-authored-by: Changho Hwang <changhohwang@microsoft.com>
Co-authored-by: Caio Rocha <caiorocha@microsoft.com>
2024-10-18 06:08:31 +00:00
Changho Hwang
8a330f9135 Update ROCm CI (#357)
Co-authored-by: Binyang Li <binyli@microsoft.com>
2024-09-20 17:57:02 +00:00
caiomcbr
7493e2f075 Double buffering for NCCL APIs (#324)
Using two scratch buffers in each peer to exchange data.

---------

Co-authored-by: Changho Hwang <changhohwang@microsoft.com>
2024-07-15 22:18:53 +00:00
Binyang Li
422c81f0f8 remove make pylib-copy command (#249)
Fix #216
Remove `make pylib-copy`
2024-01-19 12:29:15 -08:00
Changho Hwang
5fa5bd2706 Check nvidia_peermem during runtime (#234) 2023-12-25 12:02:10 +08:00
Changho Hwang
c15a166cf0 Add a documentation issue template (#230) 2023-12-05 01:01:45 +00:00
Changho Hwang
544ff0c21d ROCm support (#213)
Co-authored-by: Binyang Li <binyli@microsoft.com>
2023-11-24 16:41:56 +08:00
Changho Hwang
dab19e00c1 Templatize Dockerfiles & update workflows (#223)
Now build images by a script with a shared Dockerfile template

---------

Co-authored-by: Binyang Li <binyli@microsoft.com>
Co-authored-by: Saeed Maleki <saemal@microsoft.com>
2023-11-22 13:29:12 -08:00
Changho Hwang
f68820436c Explicit build dependency on nvidia_peermem (#201) 2023-10-23 04:29:30 +00:00
Changho Hwang
8c0f9e84d0 v0.3.0 (#171) 2023-10-11 22:35:54 +08:00
Changho Hwang
11ac824cc7 Align interfaces of put/get/putPackets/getPackets (#185) 2023-10-07 22:18:26 +08:00
Changho Hwang
497a9e0c82 Add backup workflows (#189) 2023-10-07 15:13:49 +08:00
Changho Hwang
bb64f68d74 Update issue templates (#179) 2023-09-15 04:05:09 +00:00
Saeed Maleki
e7d5e652df Python bindings (#125)
Co-authored-by: Olli Saarikivi <olsaarik@microsoft.com>
Co-authored-by: Changho Hwang <changhohwang@microsoft.com>
Co-authored-by: Binyang Li <binyli@microsoft.com>
2023-07-19 15:35:54 +08:00
Binyang2014
56bdbc2f32 Enable test for both cuda11 and cuda12 (#124)
Update pipeline: enable test for both cuda11 and cuda12
2023-07-10 13:19:14 +08:00
Changho Hwang
4114d65c60 Documents & minor updates (#119)
Co-authored-by: Saeed Maleki <saemal@microsoft.com>
Co-authored-by: Binyang Li <binyli@microsoft.com>
2023-07-07 17:35:05 +08:00
Changho Hwang
bb7b85a810 2-node AllReduce improvements (#118)
* Added `get()` interfaces to `SmChannel`
* Improved 2-node (8 gpus/node) AllReduce: algbw 139GB/s for 1GB (kernel
3) and 99GB/s for 48MB (kernel 4)
* Fixed a FIFO perf bug
* Several fixes & validations in mscclpp-test

---------

Co-authored-by: Binyang Li <binyli@microsoft.com>
Co-authored-by: Saeed Maleki <saemal@microsoft.com>
2023-07-07 07:05:46 +00:00
Binyang2014
2640578b22 Add performance check for mscclpp-test (#110)
- Add ndmv4 perf baseline
- change mscclpp-test to output perf number into a json file
- add python script to check the perf result with the baseline
2023-06-21 07:42:53 +00:00
Changho Hwang
5a4885ccbb Misc updates (#95) 2023-06-12 13:53:43 +08:00
Changho Hwang
798631bd52 Update unit tests (#81) 2023-06-08 09:58:05 +00:00
Changho Hwang
7346e70109 Use MSCCL++ Docker image for CodeQL (#94) 2023-06-06 18:42:22 +08:00
Changho Hwang
0581bfb431 Fix CodeQL workflow (#80) 2023-05-22 14:03:30 +08:00
Changho Hwang
8d54bf3301 Update CI (#79) 2023-05-21 11:45:41 -07:00
Binyang Li
5704fb7c6a update 2023-05-11 08:55:51 +00:00
Binyang Li
1487596dc8 update cpplint 2023-05-11 08:34:57 +00:00
Binyang Li
669c67b3de enable github action on all ranches 2023-05-05 08:42:25 +00:00
Changho Hwang
72431957fd Use clang-format-12 2023-03-27 14:00:03 +00:00
Binyang Li
7ec6ae9d6a add cpplint and CI 2023-03-27 03:32:10 +00:00