Commit Graph

342 Commits

Author SHA1 Message Date
Yuhao Yang
2f06867128 Optimize MHC pipeline: DeepGemm, fused norm, fused hc_head (#24775)
Co-authored-by: Cheng Wan <chwan@rice.edu>
Co-authored-by: Chunan Zeng <zcnrex@gmail.com>
2026-05-10 19:03:37 +08:00
Baizhou Zhang
ef5e9f8aba [DSV4] Cherry pick missing commits from deepseek_v4 branch and enhance tests (#24793)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: yueming-yuan <yym022502@gmail.com>
2026-05-09 04:15:37 -07:00
Liangsheng Yin
ba625d5290 slash command rerun UX: emoji semantics + result writeback (#24802) 2026-05-09 03:19:24 -07:00
Alison Shao
aefd8e257f Re-land #23109: rebase-required mode + fix for grep-no-match abort (#24180) 2026-05-08 15:28:57 -07:00
johnnycxm
cdf5771f91 [MUSA][17/N] ci: Add MUSA diffusion, sgl-kernel tests, and CI workflow support (#20672)
Co-authored-by: ximin.chen <ximin.chen@mthreads.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>
2026-05-07 20:45:21 -07:00
Liangsheng Yin
35870d55ac Deepseek V4 (#23882)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: yueming-yuan <yym022502@gmail.com>
Co-authored-by: DarkSharpness <2040703891@qq.com>
Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
Co-authored-by: yhyang201 <yhyang201@users.noreply.github.com>
Co-authored-by: yhyang201 <yhyang201@gmail.com>
Co-authored-by: Qiaolin Yu <90088090+qiaolin-yu@users.noreply.github.com>
Co-authored-by: Ethan (Yusheng) Su <11704492+yushengsu-thu@users.noreply.github.com>
Co-authored-by: Mingyi <27337995+wisclmy0611@users.noreply.github.com>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: Yihao Wang <42559837+againstentropy@users.noreply.github.com>
2026-05-07 18:32:21 -07:00
Junlin Wu
80a6014243 [diffusion][npu][quant] Add MXFP8 quantization support for Wan2.2 Diffusion on Ascend NPU (#20922)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-05-07 21:30:56 +03:00
Baizhou Zhang
ecb786c8d7 [Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm (#24268) 2026-05-06 18:59:01 -07:00
Alison Shao
094b90b1ec ci: drop 1-gpu-h100-h200 shared label (#24495) 2026-05-06 01:02:31 -07:00
Kangyan-Zhou
52b4609789 [Docker] Prep for torch 2.11: cu129 fix, image validator, dep cleanup (#23593)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-05-04 00:37:55 -07:00
Liangsheng Yin
53df43d0a3 rerun-test: route deepep h200 suite to deepep runner (#24325) 2026-05-03 15:57:53 -07:00
Mick
5925572c95 [diffusion] CI: switch CI data references to sgl-project/ci-data (#24299) 2026-05-03 23:05:12 +08:00
Mohammad Miadh Angkad
b7d8ceb444 [CI] Keep custom sgl-kernel wheel in CUDA CI (#24291) 2026-05-02 21:53:21 -07:00
Brayden Zhong
88bb5dffe4 [Dependency] Upgrade to Torch 2.11.0 (#21247)
Co-authored-by: Kangyan Zhou <zky314343421@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-05-02 12:25:36 -07:00
Kangyan-Zhou
24a6b3084d [CI] drop --prerelease allow from uv pip install suffix (#24265)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2026-05-02 12:25:01 -07:00
Kangyan-Zhou
2e72a36420 [CI] Restore SMG e2e on 2-gpu-h100 / 4-gpu-h100 runners (#24222)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 23:55:20 -07:00
Mick
b7d4647568 [diffusion] CI: change ground truth repo (#24219) 2026-05-01 21:25:40 -07:00
ishandhanani
5b7ce417d0 [P/D disagg] - support decode side radix cache (#19746) 2026-05-01 21:55:34 +08:00
Alec
9d95783603 Add Docker image provenance metadata (#24090)
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
2026-04-30 21:40:42 -07:00
Alison Shao
694ef516cb Revert "[ci] split stage-c-test-4-gpu-b200 to enable a low-disk runner pool" (#24163)
Co-authored-by: Alison Shao <alisonshao@radixark.ai>
2026-04-30 15:57:19 -07:00
Alison Shao
dc395bc059 ci: run setup_ld_library_path before install_sglang_kernel (#24141) 2026-04-30 10:55:45 -07:00
Alison Shao
7bb7f6049a ci: add per-host utilization view to runner-utilization report (#24102) 2026-04-30 10:05:16 -07:00
Mick
0b1fbdba15 [diffusion] CI: change ground truth upload path and improve publish script (#24120) 2026-04-30 12:26:10 +08:00
Alex Nails
c3ab5bec7d ci: consolidate rust + protoc install across workflows (#23700)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 13:39:02 -07:00
Kangyan-Zhou
d4040e7010 [CI] Broaden stage-b-test-1-gpu-large runner pool to H100 + H200 (#24080)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 12:18:10 -07:00
shuwenn
03147f66b8 ci: add /rerun-group to rerun all registered tests in a group (#24023)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 10:24:16 -07:00
jacky.cheng
180bb2624f [AMD] Fix CI RuntimeError: opentelemetry package is not installed (#23940)
Co-authored-by: Bingxu Chen <bingxche@amd.com>
2026-04-29 18:02:44 +08:00
Kangyan-Zhou
feec1ac7f9 ci: clean up stale-CUDA mooncake variant in install_extra_deps (#23960)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 19:32:38 -07:00
Mick
144038fbae [diffusion] chore: change default seed to 42 (#23836) 2026-04-28 20:39:23 +08:00
Lianmin Zheng
a4facdf3f6 [CI] Refactor ci_install_dependency.sh into standalone functions (#23592) 2026-04-24 17:39:39 -07:00
YC Yen-Ching Tseng
b060a5ccfd [AMD] Fix nightly version tag selection (#23644)
Co-authored-by: bingxche <bingxche@amd.com>
2026-04-24 17:39:47 +08:00
YC Yen-Ching Tseng
30909cbeeb [AMD] upd local registry address (#23607)
Co-authored-by: bingxche <bingxche@amd.com>
2026-04-24 12:07:09 +08:00
Mick
c0166355ae [diffusion] CI: minor refactor CI (#23576) 2026-04-24 08:48:31 +08:00
Sahithi Chigurupati
9891572c4a [CI] Export GB200 nightly logs to S3 (#23502) 2026-04-23 15:35:10 -07:00
Jia Guo
6428392b6f ci: fix cu129 wheel tagging + pipefail-abort in install script (follow-up to #23497) (#23587) 2026-04-23 14:52:58 -07:00
zijiexia
6490afe36e [docs] add deprecation notice banner to legacy documentation site (#23516) 2026-04-22 20:17:53 -07:00
Jia Guo
b3e6cf60aa ci: build sgl-kernel wheels for both cu129 and cu130 (#23497) 2026-04-22 18:08:36 -07:00
Kangyan-Zhou
c689f774a4 [CI] /rerun-stage: fix workflow-run URL lookup for sgl-kernel PRs (#23510)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:48:38 -07:00
Sahithi Chigurupati
9591033179 [CI] GB200 nightly: on-demand PR/branch image build and config filter (#23086) 2026-04-22 13:51:25 -07:00
Kangyan-Zhou
14ac14287c [CI] /rerun-stage: auto-include wheel build when PR modifies sgl-kernel/ (#23492)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 11:28:06 -07:00
Shangming Cai
fa85bdf4ed chore: bump mooncake version to 0.3.10.post2 (#23439) 2026-04-22 15:01:47 +08:00
Jia Guo
286fba2073 ci: use rerun_failed_jobs for skipped workflows in /rerun-failed-ci (#23008)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 23:59:15 -07:00
Alison Shao
04b1caf75b ci: enable /rerun-test for multimodal gen PR tests (#22828) 2026-04-21 21:34:14 -07:00
Yuhao Yang
f41f1a74a4 [diffusion] chore: support custom output folder name in GT generation workflow (#23422) 2026-04-22 11:18:21 +08:00
Kangyan-Zhou
77fd86f89e [ci] split stage-c-test-4-gpu-b200 to enable a low-disk runner pool (#23417)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 18:33:33 -07:00
Alison Shao
0e165ffbfc ci: enable /rerun-test for nightly test suites (#22830) 2026-04-21 18:28:10 -07:00
Bingxu Chen
09b1d10d59 [AMD] prepare for MI300x PR runner pool: registry mirror, runner routing, threshold tuning (#23156) 2026-04-21 00:58:23 -07:00
zijiexia
900aad5f72 [Docs] Sync docs_new with legacy docs and update migration redirects (#23337)
Co-authored-by: Mingyi <wisclmy0611@gmail.com>
2026-04-21 00:15:17 -07:00
Liangsheng Yin
6cc2eee50d [misc] CI hygiene: enforce __main__ entry, drop silent-skipped tests, fix rerun-test protoc (#23305) 2026-04-20 21:16:24 -07:00
Cheng Wan
ebcc2b3eec ci: run weekly est_time update on Monday using p90 of last 15 runs (#23120)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:39:27 -07:00