Torch integration (#692)

Reorganize current native algorithm implementation and DSL algorithm implementation. Provide unified API for DSL algo and native algo and provide interface to tune the algo Provide interface for pytorch integration with native API and DSL --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>
2026-04-19 22:39:11 +00:00 · 2026-01-21 20:32:24 -08:00
parent 78ce9fac8d
commit a707273701
156 changed files with 6107 additions and 4076 deletions
--- a/python/test/executor_test.py
+++ b/python/test/executor_test.py
@@ -10,7 +10,7 @@ from mscclpp import (
    npkit,
    env,
 )
-import mscclpp.comm as mscclpp_comm
+from mscclpp import CommGroup, GpuBuffer
 from mscclpp.utils import KernelBuilder, GpuBuffer, pack
 import os
 import struct
@@ -180,7 +180,7 @@ def main(
    n_iters: int = 10,
    n_graph_iters: int = 10,
 ):
-    mscclpp_group = mscclpp_comm.CommGroup(MPI.COMM_WORLD)
+    mscclpp_group = CommGroup(MPI.COMM_WORLD)
    cp.cuda.Device(mscclpp_group.my_rank % mscclpp_group.nranks_per_node).use()
    executor = Executor(mscclpp_group.communicator)
    npkit_dump_dir = env().npkit_dump_dir