Commit Graph

5 Commits

Author SHA1 Message Date
Binyang Li
a707273701 Torch integration (#692)
Reorganize current native algorithm implementation and DSL algorithm
implementation.
Provide unified API for DSL algo and native algo and provide interface
to tune the algo
Provide interface for pytorch integration with native API and DSL

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>
2026-01-21 20:32:24 -08:00
Binyang Li
610db6f023 Fix test script (#655)
Fix: #654. Address correctness_test.py crash issue

Co-authored-by: Changho Hwang <changhohwang@microsoft.com>
2025-10-21 19:57:17 +00:00
Binyang Li
5ac427610d Address teardown issue (#638)
Ignore cuda/cu errors during teardown. Some pointer may be invalid at this point
2025-09-25 12:12:40 -07:00
Binyang Li
ba4c4aaeb8 Integrate MSCCL++ with torch workload (#626)
Integrate MSCCL++ with torch
Introduce `NCCL audit shim library`, use can use following commands to
launch torch library. Also avoid break build pipeline in the CPU machine
```bash
export LD_AUDIT=$MSCCLPP_INSTALL_DIR/libmscclpp_audit_nccl.so
export LD_LIBRARY_PATH=$MSCCLPP_INSTALL_DIR:$LD_LIBRARY_PATH
torchrun --nnodes=1 --nproc_per_node=8 your_script.py
```
2025-09-09 13:28:32 -07:00
Binyang Li
2b40fe37b3 add torch test (#612)
Simple torch test
2025-08-15 10:27:21 -07:00