mscclpp

mirror of https://github.com/microsoft/mscclpp.git synced 2026-05-14 02:03:03 +00:00

Author	SHA1	Message	Date
Binyang Li	0b840baa05	Update allgather fallback algo (#476 ) Enhancements to all-gather operation, a temporary solution to fix the memory overhead when integrating msccl++ with pytorch. This solution will not register input/output buffer to msccl++, so the temp output buffer for allgather could be reused by torch automatically. * Introduced a new `allgather8` kernel function in `apps/nccl/src/allgather.hpp` to handle larger data sizes more efficiently. This includes double buffering to hide synchronization overhead and support for both in-place and out-of-place operations. * Modified the `allgather` function to decide between `allgather6` and `allgather8` based on data size and platform, improving performance for large data sizes. Configuration and environment improvements: * Added a new environment variable `MSCCLPP_DISABLE_CHANNEL_CACHE` to control whether the channel cache is disabled, enhancing configurability. This variable is now part of the `Env` class and is logged during environment initialization. * Removed the redundant global variable `mscclppDisableChannelCache` from `src/debug.cc` and updated its usage to refer to the new environment variable.	2025-03-14 11:18:03 -07:00
Qinghua Zhou	591276f9d0	Disable channel cache (#463 ) Add workaround of disabling channel cache. Related runtime parameter: -x MSCCLPP_DISABLE_CHANNEL_CACHE=TRUE (Default value: False) In this PR, some other features (e.g., ncclCommSplit) come from branch binyangli/nccl-api --------- Co-authored-by: Binyang Li <binyli@microsoft.com>	2025-02-19 19:26:12 +00:00
Changho Hwang	869cdba00c	Manage runtime environments (#452 ) * Add `Env` class that manages all runtime environments. * Changed `NPKIT_DUMP_DIR` to `MSCCLPP_NPKIT_DUMP_DIR`.	2025-01-15 09:44:52 -08:00
Changho Hwang	0c150e5166	Fix copyright messages (#367 )	2024-10-17 21:25:46 -07:00
Changho Hwang	544ff0c21d	ROCm support (#213 ) Co-authored-by: Binyang Li <binyli@microsoft.com>	2023-11-24 16:41:56 +08:00
Changho Hwang	60b3dd5a61	Bug fixes & resolve warnings (#107 ) * Fix a bug in host hashing * Fix a bug in `HostEpoch::wait()` * Remove misc warnings	2023-06-16 09:31:23 +00:00
Changho Hwang	9cee6c4a74	Cleanup old files and functions (#86 )	2023-06-01 17:34:57 +08:00
Olli Saarikivi	9f6c48cbf9	Format all files	2023-05-11 00:23:14 +00:00
Changho Hwang	d2c2ae72a7	Some cleanup	2023-04-11 08:45:22 +00:00
Changho Hwang	fe1d7fee9e	Bug Fix: null-termination in logging	2023-03-31 05:25:07 +00:00
Saeed Maleki	32c4498fb8	typo fixes	2023-03-28 00:55:41 +00:00
Saeed Maleki	75036c0f12	typo fixes	2023-03-28 00:50:59 +00:00
Saeed Maleki	43c52367fb	merged with main and simplified the callback requirements	2023-03-27 23:41:27 +00:00
Saeed Maleki	19bf369dc1	link format correction	2023-03-27 20:40:15 +00:00
Changho Hwang	8fc8f5b4fe	Lint	2023-03-27 14:09:26 +00:00
Changho Hwang	8e4146aba9	Add mscclppSetLogHandler	2023-03-27 13:33:07 +00:00
Changho Hwang	ae01fa4958	Remove mscclpp_net.h and net.h	2023-03-14 08:32:19 +00:00
Saeed Maleki	0902ce89c6	compiles	2023-02-06 05:32:24 +00:00
v-xiaoxshi	200f5637bb	more bootstrap files	2023-02-04 05:07:48 +00:00
Changho Hwang	82fe0b667d	Add a makefile and logging functions	2023-02-03 12:29:27 +00:00

20 Commits