mscclpp

mirror of https://github.com/microsoft/mscclpp.git synced 2026-05-12 17:26:04 +00:00

Author	SHA1	Message	Date
Pedram Alizadeh	97eaca2bd2	[NPKIT] Adding the NPKIT support for kernel allreduce7 in mscclpp-nccl (#399 )	2025-01-03 20:38:57 +00:00
Qinghua Zhou	ba0d0d68b8	Enhance the nccl error message handling (#434 ) Add WARN or INFO before returning the nccl error message. Change NCCL_DEBUG to MSCCLPP_DEBUG in debug message.	2025-01-03 00:50:36 +00:00
Changho Hwang	e2230aab26	Tackle build warnings (#422 ) * Comply with [CMP0165](https://cmake.org/cmake/help/latest/policy/CMP0165.html) * Tackle other warnings during build	2024-12-19 16:51:50 -08:00
SreevatsaAnantharamu	0c7ed2c674	Add ncclBcast / ncclBroadcast support (#419 ) A simple broadcast using scratch buffer and option to use an executor.	2024-12-19 01:16:30 +00:00
David Sidler	d8d0dfbffa	Fix synchronization in allreduce8 kernel (#407 ) Running kernel allreduce8 across 64 vGPUs (in CPX mode) revealed a synchronization bug. The PR addresses it by ensuring that signals are only issued after all threads in the block have issued their writes to guarantee correct ordering between data writes and signal writes. --------- Co-authored-by: Changho Hwang <changhohwang@microsoft.com>	2024-12-18 17:10:22 -08:00
Caio Rocha	774602d49c	Supporting Executor multi node in NCCL API (#412 ) Co-authored-by: Binyang Li <binyli@microsoft.com>	2024-12-18 15:50:58 -08:00
Binyang Li	fcb2e46cb1	NVLS support for NCCL API (#410 ) Co-authored-by: Qinghua Zhou <qinghuazhou@microsoft.com> Co-authored-by: Changho Hwang <changhohwang@microsoft.com>	2024-12-18 09:55:35 +00:00
Binyang Li	88d28e07a7	Select algo according to json config (#396 ) The way to run nccl-test over mscclpp: mpirun -np 8 --bind-to numa --allow-run-as-root -x LD_PRELOAD=$(pwd)/build/apps/nccl/libmscclpp_nccl.so -x NCCL_DEBUG=WARN -x MSCCLPP_EXECUTION_PLAN_DIR=/execution-files /root/nccl-tests/build/all_reduce_perf -b 1K -e 1G -f 2 -d half -G 20 -w 10 -n 20	2024-12-03 22:39:20 +00:00
Caio Rocha	d9c297ba14	AllGather Executor Support in NCCL Interface (#393 ) Co-authored-by: Ziyue Yang <ziyyang@microsoft.com> Co-authored-by: Changho Hwang <changhohwang@microsoft.com> Co-authored-by: Binyang Li <binyli@microsoft.com>	2024-11-27 17:05:51 -08:00
Caio Rocha	93628d2066	Fixing Message Boundary AllReduce Fallback Code (#391 )	2024-11-23 12:15:56 -08:00
Changho Hwang	2127a3ba29	Improve CMake options (#376 ) * Let all CMake option names start with `MSCCLPP_` * Explain the `MSCCLPP_BUILD_PYTHON_BINDINGS` option in readme --------- Co-authored-by: Binyang Li <binyli@microsoft.com>	2024-11-22 01:54:11 +00:00
Binyang Li	28a57b0610	NVLS support for msccl++ executor (#375 ) - Support mote datatype for multicast operation - Add new OP MULTI_LOAD_REDUCE_STORE to support NVLS - Modify allocSharedPhysicalCuda, which return std::shared_ptr<T> instead of std::shared_ptr<PhysicalCudaMemory> - Add Python support for allocSharedPhysicalCuda Test passed for `allreduce_nvls.json`	2024-11-20 06:43:28 +00:00
Binyang Li	4136153a76	[Doc] mscclpp docs (#348 ) Generate docs for mescclpp. Setup github action to auto-deploy github-page doc link here: https://microsoft.github.io/mscclpp --------- Co-authored-by: Changho Hwang <changhohwang@microsoft.com> Co-authored-by: Caio Rocha <caiorocha@microsoft.com>	2024-10-18 06:08:31 +00:00
Changho Hwang	f8c0bcca2b	Perf optimization & support clipping (#364 ) Co-authored-by: Nusrat Islam <Nusrat.Islam@amd.com>	2024-10-16 14:35:08 -07:00
Changho Hwang	e9294357c5	Fix NCCL API bugs (#363 )	2024-10-16 14:16:34 -07:00
Binyang Li	b30bb260e3	Tune threads per block for mscclpp executor (#345 )	2024-09-18 17:21:47 -07:00
Changho Hwang	1e82dd444f	Make ibverbs optional at compile time (#340 ) Co-authored-by: Caio Rocha <caiorocha@microsoft.com>	2024-08-21 12:47:05 -07:00
Changho Hwang	8c6fb429e9	bfloat16 support (#336 ) * Add bfloat16 support for executor and NCCL interface * Changed `gpu_data_types.hpp` into an internal header file	2024-08-12 15:41:58 -07:00
caiomcbr	67eb9b04cc	NCCL API Executor Integration (#331 ) Co-authored-by: Changho Hwang <changhohwang@microsoft.com>	2024-07-25 15:05:02 -07:00
caiomcbr	7493e2f075	Double buffering for NCCL APIs (#324 ) Using two scratch buffers in each peer to exchange data. --------- Co-authored-by: Changho Hwang <changhohwang@microsoft.com>	2024-07-15 22:18:53 +00:00
Changho Hwang	c4ca2fbc8c	Resolve clang++ warnings (#325 )	2024-07-11 07:48:35 +00:00
caiomcbr	f4c3c8f916	AllReduce Kernel for Small Messages (#322 ) Adding allreduce kernel code for message sizes smaller than 32 bytes, when the number of elements are smaller than the number of ranks. --------- Co-authored-by: Caio Rocha <caio.rocha@microsoft.com> Co-authored-by: Changho Hwang <changhohwang@microsoft.com>	2024-07-05 21:08:43 +00:00
caiomcbr	b1b9d0626c	Support NCCL APIs (#319 ) Start supporting NCCL APIs with a few limitations. --------- Co-authored-by: Caio Rocha <caio.rocha@microsoft.com> Co-authored-by: Changho Hwang <changhohwang@microsoft.com>	2024-06-27 23:54:06 +00:00

23 Commits