Commit Graph

  • 00ca2a2051 ext/ep: lift cached_notify nc cap, strided-warp head fixup (HT 442/399 GB/s) qinghuazhou/expert_parallel_gb200 Qinghua Zhou 2026-05-11 05:32:57 +00:00
  • ce1554bd1f ext/ep: fix kRDMASender epilogue tail-write race (unblocks chunk_send>16) Qinghua Zhou 2026-05-11 03:40:55 +00:00
  • 01a10e00de ext/ep: HT perf - lower lazy head-feedback threshold to chunk/4 Qinghua Zhou 2026-05-10 19:21:25 +00:00
  • e0a1bb2c42 ext/ep: WIP Phase 4 fix NVLS self-overcount + cached_notify NVLS barrier Qinghua Zhou 2026-05-10 07:23:36 +00:00
  • f2228b07bb ext/ep: WIP Phase 4 fabric-VA cross-node tail/head counter (bypass NVLS multimem.red) Qinghua Zhou 2026-05-10 06:07:46 +00:00
  • bf0a7e788a ext/ep: WIP Phase 4 normalize fence/sync ordering in combine writer Qinghua Zhou 2026-05-10 03:58:20 +00:00
  • 28f1d722e1 ext/ep: WIP Phase 4 NVLS HT B2 cooperative-copy + relaxed multimem.red Qinghua Zhou 2026-05-10 01:33:14 +00:00
  • 4569c4e751 Phase 11: hybrid NVLink + RDMA LL dispatch (+70% throughput) qinghuazhou/expert_parallel_ll_opt Qinghua Zhou 2026-05-09 23:04:15 +00:00
  • 5f219b5cda Phase 10: validate vs nccl-ep — prior 30 GB/s gap was apples-to-oranges Qinghua Zhou 2026-05-09 22:22:07 +00:00
  • 591fe8272b ext/ep: WIP Phase 4 NVLS HT B2 third multimem barrier in notify_dispatch Qinghua Zhou 2026-05-09 21:39:46 +00:00
  • 46701d4161 ext/ep: WIP Phase 4 NVLS HT B2 fabric-IPC data path Qinghua Zhou 2026-05-09 20:59:34 +00:00
  • 04f047fc5a Phase 9: multi-NIC striping (NEGATIVE RESULT) Qinghua Zhou 2026-05-09 20:22:00 +00:00
  • 3ab2e43b79 ext/ep: NVLS HT B2 phases 1-3 (notify_dispatch barrier + counter fast path) Qinghua Zhou 2026-05-09 19:25:29 +00:00
  • 9d729d795e ext/ep: document Phase 8 - combine TMA/cp.async declined after profiling qinghuazhou 2026-05-09 17:56:54 +00:00
  • f1808501e9 ext/ep: document Phase 7 attempts (multi-SGE, CUDA Graph, skew) qinghuazhou 2026-05-09 17:32:16 +00:00
  • addb6932b8 ext/ep: tune IBGDA LL grid to (1,32) — +6% dispatch / +5% combine qinghuazhou 2026-05-09 05:31:54 +00:00
  • 825fc124a5 address hang issue binyli/mnnvl Binyang Li 2026-05-09 03:16:33 +00:00
  • 8f2c4e7d98 ext/ep: route LL internode atomics over NVL72 fabric (Proposal A) Qinghua Zhou 2026-05-08 22:04:35 +00:00
  • e5ccac520c ext/ep: env-tunable IBGDA channel count (MSCCLPP_EP_IBGDA_CHANNELS) Qinghua Zhou 2026-05-08 21:56:31 +00:00
  • b557fe289a ext/ep: split combine 'reduce' bucket into grid_sync + reduce-arith Qinghua Zhou 2026-05-08 21:50:00 +00:00
  • f63bf15378 ext/ep: log per-rank IB device + add MSCCLPP_EP_IB_DEVICE_OVERRIDE Qinghua Zhou 2026-05-08 19:30:17 +00:00
  • 0e46d3052a ext/ep: per-warp_group dispatch profile slots + sampled readback Qinghua Zhou 2026-05-08 18:38:08 +00:00
  • e208cc326b WIP Binyang Li 2026-05-08 04:30:05 +00:00
  • 5516bdbb6b fix Binyang Li 2026-05-08 04:22:50 +00:00
  • 654bcfa6ba update Binyang Li 2026-05-08 03:54:32 +00:00
  • 9ff7e1c2c3 update Binyang Li 2026-05-08 03:43:34 +00:00
  • 113d859d13 fix Binyang Li 2026-05-08 03:00:53 +00:00
  • 5d16ac958e EP GB200 (4 GPUs/node) support Qinghua Zhou 2026-05-08 01:42:21 +00:00
  • fec40601b8 ext/ep: add opt-in in-kernel timestamp profiling for LL dispatch/combine Qinghua Zhou 2026-05-07 21:35:45 +00:00
  • 6547d7756c wip caiorocha/fix_er_tol Caio Rocha 2026-05-07 07:10:47 +00:00
  • 04ebba7563 ext/ep: GPU-initiated IBGDA path for low-latency dispatch/combine Qinghua Zhou 2026-05-07 05:14:15 +00:00
  • d1b04a3b26 NVLS zero-copy allreduce: support FP16 accumulator for FP8 inputs Binyang Li 2026-05-07 00:38:31 +00:00
  • 7d80a33360 Default torch example SYMMETRIC_MEMORY env to 1 Binyang Li 2026-05-06 23:43:37 +00:00
  • e8caab7c8e Strip preflight validation blocks from NVLS pipeline allreduce kernels Binyang Li 2026-05-06 23:04:41 +00:00
  • 639b80de7b Tie AllreduceAllpairPacket maxBlockNum_ to MAX_IPC_DOMAIN_NRANKS - 1 Binyang Li 2026-05-06 22:31:15 +00:00
  • 095cfff11d Revert RSAG nBlocks default to 64 Binyang Li 2026-05-06 22:23:18 +00:00
  • bde23ce38e Revert verbose RSAG zero-copy comment; rename NRanksPerNode template param Binyang Li 2026-05-06 22:16:08 +00:00
  • f0c6ac081f Fold validateIpcDomainSpansWorld into getIpcDomainNranks Binyang Li 2026-05-06 21:49:48 +00:00
  • 307a471888 Shorten verbose comments and use THROW in validateIpcDomainSpansWorld Binyang Li 2026-05-06 21:37:09 +00:00
  • 4a0d5b29d5 Simplify torch-integration tuning example Binyang Li 2026-05-06 21:14:36 +00:00
  • 905b23d9a8 Drop non-MNNVL multi_node regime from torch-integration example Binyang Li 2026-05-06 19:00:22 +00:00
  • 9aeeaf0f12 Simplify torch-integration tuning example for MPI-only multi-node testing Binyang Li 2026-05-06 18:51:29 +00:00
  • e87c66a85d ext/ep: apply clang-format and black to fix CI lint failures qinghuazhou/expert_parallel Qinghua Zhou 2026-05-06 04:12:20 +00:00
  • 01032fa167 core: TODO notes on CUDA-IPC atomicAdd context/flush caveats Qinghua Zhou 2026-05-06 03:44:10 +00:00
  • 23e8ce6dbe ext/ep: add pragma once to event.hpp and update validation docs Qinghua Zhou 2026-05-06 03:24:34 +00:00
  • c641487c55 ext/ep: fix SWITCH_* macros and add missing standard headers Qinghua Zhou 2026-05-06 03:18:39 +00:00
  • 3b96b5ab6e disable flashinfer version rjsouza/sglang-tests empyreus 2026-05-06 03:12:16 +00:00
  • b2880652ce ext/ep: remove unused mscclpp_ep CMake target Qinghua Zhou 2026-05-06 03:09:04 +00:00
  • 075a43ade7 ext/ep: remove outdated single-rank smoke test Qinghua Zhou 2026-05-06 03:09:04 +00:00
  • 5178155be8 ext/ep: add MIT license headers to EP sources and tests Qinghua Zhou 2026-05-06 02:42:49 +00:00
  • 89f17dab5b Potential fix for pull request finding Qinghua Zhou 2026-05-05 19:28:37 -07:00
  • 89cb62d047 Potential fix for pull request finding Qinghua Zhou 2026-05-05 19:24:28 -07:00
  • 1ca7b65db7 host file empyreus 2026-05-05 22:33:37 +00:00
  • 783c73b5d9 try to resolve single empyreus 2026-05-05 21:27:30 +00:00
  • 1b9f335ddd wip caiorocha/4_nodes_allreduce Caio Rocha 2026-05-05 21:19:23 +00:00
  • 822fbb2351 Adding necessary macros for enabling mrc support (#797) main Mahdieh Ghazi 2026-05-05 17:17:41 -04:00
  • 47494ea75a check sglang versions empyreus 2026-05-05 21:10:25 +00:00
  • 1fe41e05f5 sglang empyreus 2026-05-05 20:17:05 +00:00
  • 22a0953a20 fix vmss name empyreus 2026-05-05 20:02:49 +00:00
  • f61407ed91 try to install everything not just python empyreus 2026-05-05 18:47:23 +00:00
  • 77cb3675b5 fix eof empyreus 2026-05-05 17:57:52 +00:00
  • 6296803d87 Make NVLS non-zero-copy allreduce algorithms MNNVL-ready Binyang Li 2026-05-05 04:41:14 +00:00
  • 987f80025a Merge remote-tracking branch 'origin/main' into binyli/mnnvl Binyang Li 2026-05-04 23:53:25 +00:00
  • 7ca634321a trying without python empyreus 2026-05-04 22:57:37 +00:00
  • 3e655e5e02 remove cd empyreus 2026-05-04 22:20:17 +00:00
  • 528603856c Merge branch 'main' into caiorocha/4_nodes_allreduce Binyang Li 2026-05-04 15:12:10 -07:00
  • 9ec26fa4d1 Reset GPU tokens before reuse (#795) Binyang Li 2026-05-04 15:11:47 -07:00
  • 1f3f81b519 make fixes empyreus 2026-05-04 21:20:35 +00:00
  • 3f86480097 add container name to deploy empyreus 2026-05-04 20:42:56 +00:00
  • 812d43d406 return failed result for new test empyreus 2026-05-04 20:40:12 +00:00
  • f6637cc458 attempt to print nvidia-smi for cuda drivers empyreus 2026-05-04 20:11:21 +00:00
  • 21197f7c0a change directory empyreus 2026-05-04 20:02:34 +00:00
  • dfdc9f701e update pool empyreus 2026-05-04 18:29:54 +00:00
  • eaa611f220 split multi node test empyreus 2026-05-04 18:10:42 +00:00
  • de244e528b update sglang bench empyreus 2026-05-04 18:04:30 +00:00
  • 97a4b1aa69 remove duplicate stop empyreus 2026-05-04 17:23:01 +00:00
  • cb430b35d4 clean up deploy empyreus 2026-05-04 17:21:56 +00:00
  • a8b959946a Inital new test empyreus 2026-05-04 17:18:02 +00:00
  • e091f65143 Merge branch 'main' into rjsouza/sglang-tests empyreus 2026-05-04 17:06:18 +00:00
  • b30752a94f review: fix docstring, trailing comma, import placement, and filename mismatch copilot-swe-agent[bot] 2026-05-03 22:51:44 +00:00
  • 2a34a7ed11 Merge branch 'main' into caiorocha/4_nodes_allreduce Binyang Li 2026-05-03 15:45:22 -07:00
  • 9a36884369 Rename gpuMemset wrapper and zero TokenPool slots in deleter Binyang Li 2026-05-02 03:32:18 +00:00
  • 7bc5e0406b Reset GPU tokens before reuse Binyang Li 2026-05-02 03:19:31 +00:00
  • 1c29817566 Revert AllreduceRsAgZeroCopy non-symmetric ctx key tag back to ++tag Binyang Li 2026-05-01 23:40:11 +00:00
  • 2efda4d819 Restore compile-time templated NRanksPerNode for rsag_zero_copy Binyang Li 2026-05-01 23:09:22 +00:00
  • 2a2fca8a58 Rename collective ctx/kernel param nRanksPerNode to ipcDomainNranks Binyang Li 2026-05-01 19:06:07 +00:00
  • 45a651b2c8 Decouple IPC-domain hint from bootstrap nRanksPerNode Binyang Li 2026-05-01 18:27:17 +00:00
  • fdf7d579dc ext/ep: optional preallocated outputs for low_latency_dispatch Qinghua Zhou 2026-04-30 18:45:44 +00:00
  • 2529774868 tests/ep: intranode send-side counts unique (token, dst_node) to match NCCL-EP Qinghua Zhou 2026-04-29 23:31:47 +00:00
  • 6ad82e8bbe tests/ep: disable NCCL HeartbeatMonitor to silence mpirun shutdown noise Qinghua Zhou 2026-04-29 20:44:37 +00:00
  • e752dbaf97 tests/ep: add NCCL-EP six-metric BW breakdown (send/recv x total/nvl/rdma) Qinghua Zhou 2026-04-29 20:44:10 +00:00
  • f2feb120b8 ext/ep: refresh README to reflect current LL, proxy sharding, and bench harness Qinghua Zhou 2026-04-29 18:26:36 +00:00
  • 9213587ffe ep tests: report dispatch/combine min, avg, max time and use avg for BW Qinghua Zhou 2026-04-29 16:50:33 +00:00
  • afbdcd6a3d ep tests: clean shutdown to silence TCPStore/HeartbeatMonitor noise Qinghua Zhou 2026-04-29 05:16:22 +00:00
  • 2c52937b26 Fix FP8 ROCm build/test issues and dtype naming (#792) Binyang Li 2026-04-28 15:02:22 -07:00
  • 533f329971 Tune no-sym MNNVL with RSAG zero-copy Binyang Li 2026-04-28 16:23:23 +00:00
  • 3bc00cb7f0 Enable NVLS zero-copy without symmetric memory flag Binyang Li 2026-04-28 08:24:49 +00:00
  • 865c2bc795 Optimize MNNVL allreduce without symmetric memory Binyang Li 2026-04-28 07:55:52 +00:00
  • dded5e0e39 Improve MNNVL allreduce tuning performance Binyang Li 2026-04-28 06:41:17 +00:00
  • 893a08e69c Enable MNNVL allreduce tuning Binyang Li 2026-04-28 05:38:59 +00:00