Commit Graph

  • 4d05ce3889 fix formatting rjsouza/nvls-allgather-pr Empyreus 2026-06-27 04:10:55 +00:00
  • b90a6dcd2b fit formatting Empyreus 2026-06-27 00:05:44 +00:00
  • d68a145b1a Merge branch 'main' into rjsouza/nvls-allgather-pr Empyreus 2026-06-27 00:03:15 +00:00
  • 8689d4889c update comments, update defaults, add error handling Empyreus 2026-06-26 23:48:36 +00:00
  • 27a57232de add code changes and instructions for running spud benchmark on red tenant mahdieh/ep-benchmarking Mahdieh Ghazi 2026-06-26 16:23:41 +00:00
  • a900f009ca WIP binyli/ep Binyang Li 2026-06-26 04:39:04 +00:00
  • 3545225840 update Binyang Li 2026-06-26 04:22:40 +00:00
  • c17eaf3d78 WIP Binyang Li 2026-06-26 04:12:38 +00:00
  • 00e41b8976 ep(python): MoECommunicator mode="ht" (FLAT) + HT benchmarks via the high-level API qinghuazhou/ep_python_api_ht Qinghua Zhou 2026-06-26 02:44:35 +00:00
  • bee4ea9891 FIx Binyang Li 2026-06-26 00:19:19 +00:00
  • cb045249ea fix Binyang Li 2026-06-26 00:02:58 +00:00
  • 7b25bd32be WIP Binyang Li 2026-06-26 00:00:08 +00:00
  • 9c6b28337d Fix unit test (#823) main Binyang Li 2026-06-25 16:03:13 -07:00
  • 8c3730b495 Migrate to C++20 and drop CUDA 11 support (#822) RJ Souza 2026-06-25 12:59:58 -07:00
  • 2d0b8e2198 WIP Binyang Li 2026-06-25 18:21:25 +00:00
  • 56dc9cba63 Update port channel perf tests (#820) Changho Hwang 2026-06-25 19:21:51 +08:00
  • 49601f9cf7 rename Binyang Li 2026-06-25 05:22:25 +00:00
  • e30d64e14a WIP Binyang Li 2026-06-25 05:09:46 +00:00
  • cb3d5530e0 update Binyang Li 2026-06-25 04:14:52 +00:00
  • e9a5acc7d4 ep(python): high-level MoECommunicator HT (FLAT) dispatch/combine API Qinghua Zhou 2026-06-25 02:35:16 +00:00
  • 51cf76177e Merge branch 'main' into binyli/add-uts binyli/add-uts Binyang Li 2026-06-24 17:01:13 -07:00
  • 9aab9cacc0 support rocm7.2 (#819) Binyang Li 2026-06-24 16:09:34 -07:00
  • 6ecb19cbbf update Binyang Li 2026-06-24 21:18:26 +00:00
  • 374e8b2dc4 udpate Binyang Li 2026-06-24 17:14:21 +00:00
  • 354bc34d76 WIP Binyang Li 2026-06-24 04:39:12 +00:00
  • c7a2df6885 ep(intranode): TMA direct-gather combine + all-sender dispatch + per-phase SM knobs qinghuazhou/nccl_ep_dispatch_port Qinghua Zhou 2026-06-24 03:23:48 +00:00
  • 93f96e97cd Merge branch 'main' into rjsouza/nvls-allgather-pr Empyreus 2026-06-23 23:27:24 +00:00
  • 88e1a44858 wip Caio Rocha 2026-06-23 22:41:27 +00:00
  • 4c624ce09e Apply IB CI fixes chhwang/ib-ci-fix Changho Hwang 2026-06-23 17:01:44 +00:00
  • bbc9492185 Update port channel perf tests Changho Hwang 2026-06-23 16:26:49 +00:00
  • 2091a337be wip Caio Rocha 2026-06-23 09:04:05 +00:00
  • f74814142e wip Caio Rocha 2026-06-23 08:32:14 +00:00
  • bd0f15b4ef WIP Binyang Li 2026-06-22 21:50:39 +00:00
  • 5e0c1de254 WIP Binyang Li 2026-06-22 17:54:46 +00:00
  • efdbd4313c ep(ncclep): channel-adaptive warp count for combine TMA gather Qinghua Zhou 2026-06-22 02:20:28 +00:00
  • c6c1492679 ep(ncclep): combine TMA gather default 12->14 warps (-2 to -7% combine) Qinghua Zhou 2026-06-21 07:34:39 +00:00
  • 4c7e95a582 WIP Binyang Li 2026-06-21 06:28:22 +00:00
  • 42f00577cf ep(ncclep): TMA-staged flat-combine gather (default; -7 to -23% combine) Qinghua Zhou 2026-06-21 05:17:46 +00:00
  • 125cb0205e WIP Binyang Li 2026-06-21 01:40:22 +00:00
  • c3a4b641ac ep(ncclep): extract lean flat-combine gather kernel (-12 to -30% combine at low SM) Qinghua Zhou 2026-06-20 07:20:37 +00:00
  • 49a046396a ep(ncclep): raise MSCCLPP_EP_DISPATCH_NSM clamp to [1, num_sms] for the flat path Qinghua Zhou 2026-06-20 04:07:57 +00:00
  • d212966c8d ep(ncclep): inc7 atomic routing-slot assignment at low channel count (-21% dispatch @ NSM16) Qinghua Zhou 2026-06-19 21:37:37 +00:00
  • dc0b8d75f3 GB200 support: SendRecv DSL collective and per-channel executor connections (#810) Binyang Li 2026-06-19 13:19:01 -07:00
  • 683a5a7648 ep(ncclep): inc6 B-depth-3 full-token TMA tile (1 S2G/dst, -3 to -6% more dispatch) Qinghua Zhou 2026-06-18 23:19:43 +00:00
  • aca50d56fd ep(ncclep): inc6 Stage B-depth-3 whole-token TMA tiles via dynamic SMEM (-16 to -27% dispatch) Qinghua Zhou 2026-06-18 22:20:29 +00:00
  • 5daea8f8cb ep(ncclep): inc6 Stage B-depth-1 cross-token TMA pipeline (-2.5% dispatch) Qinghua Zhou 2026-06-18 21:10:12 +00:00
  • e342039f88 add comments Empyreus 2026-06-18 17:50:18 +00:00
  • 7813f3b1b0 remove execution_kernel alignment check Empyreus 2026-06-18 17:00:49 +00:00
  • 8386ed2a1f ep(ncclep): inc6 Stage A pipelined TMA sender ring (-4-5% dispatch) Qinghua Zhou 2026-06-18 16:59:11 +00:00
  • 9a02f3669f enforce scratch on boardcast packets Empyreus 2026-06-18 16:17:56 +00:00
  • 253bc05c7c add auto_sync false flag Empyreus 2026-06-18 00:13:36 +00:00
  • 9b4175412b combine groupstore and groupstorepacket Empyreus 2026-06-18 00:07:38 +00:00
  • 24187fcded use .to_dict() Empyreus 2026-06-18 00:01:41 +00:00
  • 3b5270e5d5 ep(ncclep): inc6 flat combine + decoupled dispatch/combine SM knobs Qinghua Zhou 2026-06-17 23:52:44 +00:00
  • 1805ad0db6 ep(ncclep): inc6 flat all-sender dispatch (kEpFlat, dispatch validated) Qinghua Zhou 2026-06-17 21:13:13 +00:00
  • 183dcb5daa ep(ncclep): inc5 sender-direct TMA dispatch (lane-striped ring) Qinghua Zhou 2026-06-17 07:12:22 +00:00
  • e48c6da34b fix formatting Empyreus 2026-06-17 00:11:06 +00:00
  • 1d95244cf9 tune for ptk Empyreus 2026-06-16 19:46:14 +00:00
  • f4fbd093db lint Binyang Li 2026-06-16 15:58:10 +00:00
  • de91542355 WIP Binyang Li 2026-06-16 15:56:50 +00:00
  • bb31c15dbc Merge branch 'feature/ep' into binyli/ep Binyang Li 2026-06-16 08:55:10 -07:00
  • 57fb704602 lint Binyang Li 2026-06-16 15:54:45 +00:00
  • dcd16d433d WIP Binyang Li 2026-06-16 02:58:26 +00:00
  • 74dce951bf update Binyang Li 2026-06-16 01:14:14 +00:00
  • 29847e6179 MoE Commnucator design doc Binyang Li 2026-06-16 00:38:15 +00:00
  • 462ab1661a docs(ep): document MSCCLPP_EP_DIRECT and MSCCLPP_EP_INTRA_DIRECT for GB200 feature/ep Qinghua Zhou 2026-06-15 23:08:43 +00:00
  • 9c26eb4b70 add test binyli/mnnvl Binyang Li 2026-06-15 16:51:54 +00:00
  • ddf8b14d94 update for buffer pool Binyang Li 2026-06-13 00:41:05 +00:00
  • 0a26c30632 for buffer pool Binyang Li 2026-06-12 23:43:40 +00:00
  • 14f131407b ep(ncclep): inc5 combine-gather correctness at >32 ranks (16n) Qinghua Zhou 2026-06-12 05:26:09 +00:00
  • 3b6b2ac303 ep(intranode): sender direct-write dispatch (MSCCLPP_EP_INTRA_DIRECT) qinghuazhou 2026-06-11 23:07:28 +00:00
  • 1e17f20618 add allgather packet algorithm Empyreus 2026-06-11 20:46:51 +00:00
  • 02eb2cfc2e add support for allgather packet for small message sizes Empyreus 2026-06-11 20:46:09 +00:00
  • cc34e72b64 ep(ncclep): inc5 combine-direct gather (kEpDirect) qinghuazhou 2026-06-11 06:37:25 +00:00
  • 1f7942a804 ep(ncclep): inc5 keep full rdma clean for combine under kEpDirect qinghuazhou 2026-06-11 04:07:58 +00:00
  • ca829f6e8f ep(ncclep): inc5 ring-slot shrink + read-once direct write qinghuazhou 2026-06-11 02:31:19 +00:00
  • b6140b0229 ep(ncclep): increment 5 - sender direct-write dispatch (kEpDirect, dispatch-only WIP) qinghuazhou 2026-06-11 00:33:36 +00:00
  • 5d7737437a handle non 16bit aligned Empyreus 2026-06-09 18:54:38 +00:00
  • 6f61c014c1 fix missing flag Empyreus 2026-06-09 18:07:25 +00:00
  • 3b263c3324 revert useIB change Empyreus 2026-06-09 16:02:33 +00:00
  • 1e43515ae1 ep(ncclep): increment 4b (EXPERIMENTAL, gated OFF) - S2G-only TMA forwarder write qinghuazhou/nccl_ep_dispatch_port_inc4b_tma qinghuazhou 2026-06-08 23:50:24 +00:00
  • 31c930d8c5 ep(ncclep): increment 4a - VMM unicast recv pool (TMA-eligible peer mapping) qinghuazhou 2026-06-08 22:16:05 +00:00
  • 5348c4a774 refactor function for thread usage Empyreus 2026-06-08 21:53:46 +00:00
  • 1e1187aa65 reuse useIB function Empyreus 2026-06-08 21:53:16 +00:00
  • f2204ee569 improve variable names Empyreus 2026-06-08 21:02:39 +00:00
  • b864746083 python formatting fixes Empyreus 2026-06-08 20:28:34 +00:00
  • 2b87985927 update to type agnostic Empyreus 2026-06-08 19:59:40 +00:00
  • 0d8efdb43d update algo Empyreus 2026-06-08 18:24:02 +00:00
  • 00668b4a41 add allgather gstore support Empyreus 2026-06-06 01:28:12 +00:00
  • 54bfb1d3b7 add flag to disable IB Empyreus 2026-06-05 17:17:38 +00:00
  • aec37678dc update to type agnostic rjsouza/allgather-nvls Empyreus 2026-06-08 19:59:40 +00:00
  • 68a785aad8 update algo Empyreus 2026-06-08 18:24:02 +00:00
  • e1c2679506 add more logs Binyang Li 2026-06-08 17:01:50 +00:00
  • ac25cf18b6 ep(ncclep): increment 3 - cross-GPU peer-map direct-write (eliminate receiver hidden drain) qinghuazhou 2026-06-08 16:49:11 +00:00
  • 2ebf81aa35 ep(ncclep): increment-3 de-risk - DRAIN_NOOP probe + SKIP_VERIFY test gate qinghuazhou 2026-06-08 16:25:53 +00:00
  • fa07b496ae ep(ncclep): increment 2 - deepen drain-copy MLP unroll 5->28 qinghuazhou 2026-06-08 15:51:12 +00:00
  • dbef7a5f31 ep(ncclep): increment 1 - same-GPU fused direct-write to recv_x qinghuazhou 2026-06-08 15:43:53 +00:00
  • 3ad6b70d7d ep: scaffold guarded dispatch_ncclep kernel (NCCL-EP port baseline) + launch-site selection qinghuazhou 2026-06-08 07:08:55 +00:00
  • 3a9ca157f3 ep: add MSCCLPP_EP_DISPATCH_NCCLEP build guard (EP_DISPATCH_NCCLEP) for NCCL-EP-ported warp-specialized HT dispatch (default OFF) qinghuazhou 2026-06-08 06:58:13 +00:00
  • 056eadcf87 add allgather gstore support Empyreus 2026-06-06 01:28:12 +00:00