Commit Graph

26 Commits

Author SHA1 Message Date
Binyang Li
af0bb86e07 Merge mscclpp-lang to mscclpp project (#442)
First step to merge msccl-tools into mscclpp repo. In this step will
move all msccl related code, pass the current tests and do some
necessary refactor.

Add `mscclpp.language` module
Add `_InstructionOptimizer` and `DagOptimizer` class to optimize the dag
Add `DagLower` to lower dag to intermediate representation 
Add documents for mscclpp.language
Remove msccl related code
2025-01-22 09:47:37 -08:00
Changho Hwang
34945fb107 Add GpuBuffer class (#423)
* Renamed and moved mem alloc functions into the `mscclpp::detail::`
namespace (now `mscclpp::detail::gpuCalloc*<T>()`)
* Deprecated constructor-calling mem alloc functions
(`mscclpp::makeShared*<T>()` and `mscclpp::makeUnique*<T>()`)
* Added a new `mscclpp::GpuBuffer<T>()` class that should be used in
general for allocating communication buffers
* Added a new `mscclpp.utils.GpuBuffer` Python class that inherits
`cupy.ndarray` and allocates using `mscclpp::gpuMemAlloc`
* Renamed `mscclpp::memcpyCuda*<T>()` functions into
`mscclpp::gpuMemcpy*<T>()` for name consistency
* A few fixes in NVLS memory allocation
* Tackled minor compiler warnings
2025-01-07 18:40:01 -08:00
Binyang Li
3d6bfed2cf Update version number (#433)
Co-authored-by: github-actions <github-actions@github.com>
2025-01-02 16:45:08 -08:00
Binyang Li
863a599360 Disable CuMemMap check for ROCm (#411)
Co-authored-by: Changho Hwang <changhohwang@microsoft.com>
2024-12-17 08:36:25 +00:00
Changho Hwang
756f24c697 Revised ProxyChannel interfaces (#400)
* Renamed `ProxyChannel` -> `BaseProxyChannel` and `SimpleProxyChannel`
-> `ProxyChannel`. It makes the interface more consistent by defining
channels to be associated with a certain src/dst memory region:
`ProxyChannel` as "sema + src/dst + fifo" and `SmChannel` as "sema +
src/dst". BaseProxyChannel is not associated with any memory regions, as
"sema + fifo".
* `ProxyChannelDeviceHandle` now inherits from
`BaseProxyChannelDeviceHandle`, instead of having one as a member.
2024-12-06 10:53:34 -08:00
Changho Hwang
2127a3ba29 Improve CMake options (#376)
* Let all CMake option names start with `MSCCLPP_`
* Explain the `MSCCLPP_BUILD_PYTHON_BINDINGS` option in readme

---------

Co-authored-by: Binyang Li <binyli@microsoft.com>
2024-11-22 01:54:11 +00:00
Binyang Li
28a57b0610 NVLS support for msccl++ executor (#375)
- Support mote datatype for multicast operation
- Add new OP MULTI_LOAD_REDUCE_STORE to support NVLS
- Modify allocSharedPhysicalCuda, which return std::shared_ptr<T>
instead of std::shared_ptr<PhysicalCudaMemory>
- Add Python support for allocSharedPhysicalCuda

Test passed for `allreduce_nvls.json`
2024-11-20 06:43:28 +00:00
Binyang Li
4136153a76 [Doc] mscclpp docs (#348)
Generate docs for mescclpp.
Setup github action to auto-deploy github-page
doc link here: https://microsoft.github.io/mscclpp

---------

Co-authored-by: Changho Hwang <changhohwang@microsoft.com>
Co-authored-by: Caio Rocha <caiorocha@microsoft.com>
2024-10-18 06:08:31 +00:00
Changho Hwang
40cb196553 v0.5.2 (#328) 2024-07-16 00:35:18 +00:00
caiomcbr
b1b9d0626c Support NCCL APIs (#319)
Start supporting NCCL APIs with a few limitations.

---------

Co-authored-by: Caio Rocha <caio.rocha@microsoft.com>
Co-authored-by: Changho Hwang <changhohwang@microsoft.com>
2024-06-27 23:54:06 +00:00
Angelica Moreira
34f4d9d006 Update quickstart.md (#314)
Updating the docker image name tag and the python benchmark path.
2024-06-19 22:26:13 +00:00
Changho Hwang
cddffbc8b6 v0.5.1 (#308) 2024-05-26 14:31:29 -07:00
Changho Hwang
9c2a96060a v0.5.0 (#298) 2024-05-04 16:51:48 -07:00
Changho Hwang
1a7cb98e3a v0.4.3 (#279) 2024-03-27 11:53:09 -07:00
Changho Hwang
cdaf3aea3d New packet format & optimizations (#256)
Co-authored-by: Binyang Li <binyli@microsoft.com>
2024-02-20 20:01:37 -08:00
Changho Hwang
f1605b73d6 v0.4.2 (#236) 2023-12-18 11:42:58 +08:00
Binyang Li
f1b2c9df12 Fix performance downgrade issue & update doc (#229)
For push function, we only need to make sure the instruction `st.global`
will be executed after the while loop. Since there is a Write-After-Read
hazard for `trigger.fst` (Check `this->triggers[curFifoHead % size].fst
!= 0` first then write value to `triggers[curFifoHead % size]`), we can
expect the compiler and hardware can handle this situation correctly.
Remove the `release.sys` there.

BTW, `st.global.release.sys.v2.u64` will cause perf regression issue.
Previous we use `st.global.release.cta.v2.u64`, but seems not necessary.
2023-12-04 10:20:10 -08:00
Changho Hwang
351b95b926 Update documents (#225)
Adding AMD supports on the docs
2023-11-24 17:00:18 +08:00
Changho Hwang
15f6dcca49 Update documentation (#217)
Co-authored-by: Saeed Maleki <saemal@microsoft.com>
2023-11-22 12:58:04 -08:00
Changho Hwang
f68820436c Explicit build dependency on nvidia_peermem (#201) 2023-10-23 04:29:30 +00:00
Changho Hwang
3df18d20a3 Update install guidelines (#159) 2023-08-30 10:40:40 -07:00
Changho Hwang
4114d65c60 Documents & minor updates (#119)
Co-authored-by: Saeed Maleki <saemal@microsoft.com>
Co-authored-by: Binyang Li <binyli@microsoft.com>
2023-07-07 17:35:05 +08:00
Changho Hwang
6ec585f3d8 Packet copy for IB (#109)
* Extend channels to support LL with IB
* Rename classes and interfaces
2023-06-28 10:39:31 -07:00
Changho Hwang
85e664c2f7 Update docs (#88) 2023-06-05 13:13:10 +08:00
Changho Hwang
9cee6c4a74 Cleanup old files and functions (#86) 2023-06-01 17:34:57 +08:00
Ziyue Yang
48a278d2a5 init doxyfile 2023-05-10 16:23:02 +00:00