33 Commits

Author SHA1 Message Date
Changho Hwang
79a014976d updates 2026-03-18 20:30:18 +00:00
Changho Hwang
b64536f28e Merge branch 'main' into copilot/remove-gtest-use-custom-framework 2026-02-18 20:35:34 -08:00
Changho Hwang
4d9aceac6f badge 2026-02-18 20:25:50 -08:00
Binyang Li
d0d5a8c034 Add new CI pipeline for RCCL test (#746)
Add rccl allreduce/allgather test in ci pipeline
Fix hang issue which introduced by PR #741

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-13 10:50:10 -08:00
Binyang Li
c12822a7af create CI pipeline for rocm (#718)
Create CI pipeline for AMD GPU.
2026-02-09 16:55:16 -08:00
Changho Hwang
9650e5c37e Update documentation (#576)
Documentation overhaul
2025-08-07 15:37:37 -07:00
Binyang Li
5e991cf5c8 update readme & bump version (#550)
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: Changho Hwang <changhohwang@microsoft.com>
2025-07-12 01:00:18 -07:00
Changho Hwang
20eca28942 Fix a FIFO correctness bug (#549)
* Add a FIFO test code that reproduced a correctness issue
* Fix the correctness issue by using pinned memory instead of cudaMemcpy

---------

Co-authored-by: Binyang Li <binyli@microsoft.com>
2025-07-11 23:53:59 +00:00
Wenxuan Tan
2151790463 Fix some typos in docs (#555) 2025-06-19 19:39:37 +00:00
Changho Hwang
908659318b Update citations (#524)
Co-authored-by: Aashaka Shah <aashaka96@gmail.com>
2025-05-13 17:52:04 -07:00
Changho Hwang
3565bfdf6d Renaming channels (#436)
Renamed `ProxyChannel` to `PortChannel` and `SmChannel` to
`MemoryChannel`
2025-01-24 14:25:31 -08:00
Binyang Li
af0bb86e07 Merge mscclpp-lang to mscclpp project (#442)
First step to merge msccl-tools into mscclpp repo. In this step will
move all msccl related code, pass the current tests and do some
necessary refactor.

Add `mscclpp.language` module
Add `_InstructionOptimizer` and `DagOptimizer` class to optimize the dag
Add `DagLower` to lower dag to intermediate representation 
Add documents for mscclpp.language
Remove msccl related code
2025-01-22 09:47:37 -08:00
Binyang Li
776f24e787 update READMED (#414) 2024-12-19 05:54:27 +00:00
Changho Hwang
756f24c697 Revised ProxyChannel interfaces (#400)
* Renamed `ProxyChannel` -> `BaseProxyChannel` and `SimpleProxyChannel`
-> `ProxyChannel`. It makes the interface more consistent by defining
channels to be associated with a certain src/dst memory region:
`ProxyChannel` as "sema + src/dst + fifo" and `SmChannel` as "sema +
src/dst". BaseProxyChannel is not associated with any memory regions, as
"sema + fifo".
* `ProxyChannelDeviceHandle` now inherits from
`BaseProxyChannelDeviceHandle`, instead of having one as a member.
2024-12-06 10:53:34 -08:00
Jeff Rasley
449c274326 [docs] fix quickstart link (#374)
Small fix to update quickstart link
2024-10-30 13:13:33 +08:00
Changho Hwang
8a330f9135 Update ROCm CI (#357)
Co-authored-by: Binyang Li <binyli@microsoft.com>
2024-09-20 17:57:02 +00:00
Changho Hwang
351b95b926 Update documents (#225)
Adding AMD supports on the docs
2023-11-24 17:00:18 +08:00
Changho Hwang
15f6dcca49 Update documentation (#217)
Co-authored-by: Saeed Maleki <saemal@microsoft.com>
2023-11-22 12:58:04 -08:00
Changho Hwang
8c0f9e84d0 v0.3.0 (#171) 2023-10-11 22:35:54 +08:00
Saeed Maleki
8d1b984bed Change device handle interfaces & others (#142)
* Changed device handle interfaces
* Changed proxy service interfaces
* Move device code into separate files
* Fixed FIFO polling issues
* Add configuration arguments in several interface functions

---------

Co-authored-by: Changho Hwang <changhohwang@microsoft.com>
Co-authored-by: Binyang Li <binyli@microsoft.com>
Co-authored-by: root <root@a100-saemal0.qxveptpukjsuthqvv514inp03c.gx.internal.cloudapp.net>
2023-08-16 20:00:56 +08:00
Binyang2014
a58e2e9623 Make sure the semaphore not be released during the lifecycle of SmChannel (#131)
Fix #126

 - Put `std::shared_ptr<SmDevice2DeviceSemaphore>` into the `SmChannel` 
 - add a `DeviceHandle` struct in `SmChannel`
 - add `DeviceHandle` template
 
Users need to write code like this to use channel in device side:
```
using DeviceHandle = mscclpp::DeviceHandle<T>;
__device__ DeviceHandle<mscclpp::SimpleProxyChannel> channel;
__device__ DeviceHandle<mscclpp::SmChannel> smChannel;
```

To cover a channel to deviceHandle, need to call this function:
`mscclpp::deviceHandle(SimpleProxyChannel or SmChannel)`

---------

Co-authored-by: Changho Hwang <changhohwang@microsoft.com>
2023-07-20 12:18:22 +08:00
Saeed Maleki
e7d5e652df Python bindings (#125)
Co-authored-by: Olli Saarikivi <olsaarik@microsoft.com>
Co-authored-by: Changho Hwang <changhohwang@microsoft.com>
Co-authored-by: Binyang Li <binyli@microsoft.com>
2023-07-19 15:35:54 +08:00
Changho Hwang
4114d65c60 Documents & minor updates (#119)
Co-authored-by: Saeed Maleki <saemal@microsoft.com>
Co-authored-by: Binyang Li <binyli@microsoft.com>
2023-07-07 17:35:05 +08:00
Changho Hwang
6ec585f3d8 Packet copy for IB (#109)
* Extend channels to support LL with IB
* Rename classes and interfaces
2023-06-28 10:39:31 -07:00
Changho Hwang
85e664c2f7 Update docs (#88) 2023-06-05 13:13:10 +08:00
Ziyue Yang
e257f19cb8 add doc section in readme 2023-05-11 00:46:02 +00:00
Changho Hwang
8e120bf03c Fix perf numbers in README.md 2023-04-24 18:46:34 +08:00
Changho Hwang
815cfec6e7 Update perf numbers in README.md 2023-04-24 18:31:02 +08:00
Saeed Maleki
33af4bfb67 no gdr copy anywhere in the code except for the files that are not compiled 2023-03-28 05:36:31 +00:00
Changho Hwang
0edb89dba2 Update README.md 2023-03-27 23:29:24 +08:00
Changho Hwang
798759a225 Update README.md 2023-03-10 16:16:43 +08:00
Microsoft Open Source
907d9fc948 README.md updated to template 2023-02-01 16:28:55 -08:00
Saeed Maleki
491da5a9e4 Initial commit 2023-02-01 16:24:26 -08:00