Commit Graph

1815 Commits

Author SHA1 Message Date
fzyzcjy
cc63c99f11 Enhance hook mechanism in dumper (#19073) 2026-02-22 16:13:38 +08:00
fzyzcjy
fdf80b5031 Extract framework plugins in dumper (#19072) 2026-02-22 16:10:43 +08:00
fzyzcjy
e32b5364a2 Auto annotate context in dumper (#19071) 2026-02-22 16:08:48 +08:00
fzyzcjy
8bc0751376 Support enabling partial non intrusive dump in dumper (#19069) 2026-02-22 16:07:45 +08:00
fzyzcjy
0384c459a7 Support non-intrusive dumping in dumper (#19068) 2026-02-22 16:04:02 +08:00
fzyzcjy
5eccc3cff9 Refactor dumper and change on_forward_pass_start API (#19065) 2026-02-22 16:03:27 +08:00
Liangsheng Yin
1f2da824dd [Benchmark] Remove re-exports from bench_serving.py (#19130) 2026-02-21 14:30:30 -08:00
Liangsheng Yin
3fb457b103 Tiny rename cuda_graph tests to piecewise_cuda_graph (#19128) 2026-02-21 13:46:01 -08:00
Xinyuan Tong
4a362a0e04 fix tool handling in OpenAIServingChat (#18996)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2026-02-21 22:07:09 +08:00
Xinyuan Tong
cc451671b5 [FEAT] Add Anthropic compatible API endpoint (#18630)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2026-02-21 19:37:38 +08:00
fzyzcjy
046ef0aa35 Support using SGLang port in dumper (#19038) 2026-02-20 12:30:24 +08:00
fzyzcjy
2fecc2c075 Support resetting and enhance HTTP endpoints for dumper (#19046)
Co-authored-by: Yueming Yuan <112649537+yueming-yuan@users.noreply.github.com>
2026-02-20 12:29:09 +08:00
fzyzcjy
503bf3047a Enhance configure and env parsing in dumper (#19034) 2026-02-20 12:28:10 +08:00
fzyzcjy
df995aab56 Support filtering labels in dumper (#19018) 2026-02-20 12:27:12 +08:00
fzyzcjy
261bca3c58 Support captured dump output and console output control in dumper (#19017) 2026-02-20 12:26:24 +08:00
fzyzcjy
fc1500adc6 Hint users when wrongly execute it with partial ranks in dumper (#19014) 2026-02-20 12:25:54 +08:00
fzyzcjy
b41d412c3d Support cleanup previous dumps in dumper (#19013) 2026-02-20 12:25:21 +08:00
Cheng Wan
73a7f0d049 Revert "Add SDAR model support" (#19032) 2026-02-19 16:03:56 -08:00
chengshuang18
44ab752b7a Add SDAR model support (#18318)
Co-authored-by: root <root@gpu-lg-cmc-h-h200-3047.host.h.pjlab.org.cn>
Co-authored-by: chengshuang <chengshuang@pjlab.org.cn>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
2026-02-19 11:20:32 -08:00
satyamk7054
963def7f26 Move lora request validation to tokenizer_manager from server (#18962)
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
2026-02-19 21:03:19 +08:00
pansicheng
48642d5384 [RadixTree][4/N Refactor]: Move available_and_evictable_str to individual radix cache classes (#17852) 2026-02-19 17:03:15 +08:00
shaharmor98
82a0bafc1c Feat/add fi selective state update kernel call (#18070)
Signed-off-by: Shahar Mor <smor@nvidia.com>
2026-02-19 16:56:06 +08:00
Yuwei An
0be30d4b0d Fix PCG MoE Error (#17739) 2026-02-19 16:48:06 +08:00
Ethan (Yusheng) Su
9c5aae4df5 [Fix] Add lora tied lm head support (for Qwen2.5, Gemma, etc model need) (#18634) 2026-02-19 00:34:51 +08:00
Alison Shao
34d975b18f Fix eval tests not capturing server launch failures (#18886) 2026-02-18 07:59:03 +08:00
Liangsheng Yin
83a475e8d7 feat: add cuda core dump CI warpper (#18909) 2026-02-17 14:49:26 -08:00
Minglei Zhu
bf52388354 [PCG] support piecewise cuda graph for kimi-linear model (#18849) 2026-02-17 23:31:12 +08:00
Alison Shao
7e41ac6c8d Skip flaky test_tool_choice_required_non_streaming for Mistral (#18889) 2026-02-17 12:50:55 +08:00
Shivam jindal
4f0409f8aa [Model] Add Qwen3ForRewardModel and fix Qwen3ForSequenceClassification (#17992)
Co-authored-by: yes-its-shivam <yes-its-shivam@users.noreply.github.com>
2026-02-16 19:44:41 +08:00
fzyzcjy
f554b3c27b Support dumping gradients, parameters, lazy values (#18881)
Co-authored-by: Yueming Yuan <112649537+yueming-yuan@users.noreply.github.com>
2026-02-16 13:34:06 +08:00
fzyzcjy
9a7d8d5eb0 Collect upper level metadata to dump output (#18880) 2026-02-16 13:31:19 +08:00
fzyzcjy
949792d0c6 Change dump output format to dict with value and metadata (#18879) 2026-02-16 13:30:47 +08:00
fzyzcjy
02816abc0d Flip dumper to disable by default and refactor environment handling (#18878) 2026-02-16 13:29:32 +08:00
Rain Jiang
0ffd0a3995 Nsa trtllm mla sparse fp8 support with Deepseek v3.2 NVFP4 (#18389) 2026-02-16 09:29:54 +08:00
Mohammad Miadh Angkad
8290171f52 [CI] Remove --mem-fraction-static 0.93 from gpt-oss test (#18869) 2026-02-16 09:24:11 +08:00
Chanh Nguyen
597d17dd18 Use ephemeral nccl port via get_free_port() (#18009)
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>
2026-02-16 00:32:47 +08:00
Zack Yu
536ed3143b test: add test for Modelopt FP8 on SM90 (#18463)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-16 00:29:37 +08:00
SoluMilken
07a24f1a38 update pre-commit config (#18860) 2026-02-16 00:18:31 +08:00
Michael
88010e9601 [AMD] Fix nightly 1-GPU test failures and bench_serving regression (#18761)
Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>
2026-02-15 20:36:47 +08:00
fzyzcjy
90555a0228 Add missing dumper tests (#18859) 2026-02-15 18:42:57 +08:00
fzyzcjy
4c7f986c6b Extract dumper and prefill delayer tests common utils (#18857) 2026-02-15 18:33:23 +08:00
Bhavneek Singh
1ce3420784 Model: Support IBM Granite (Dense/Mamba + MoE) (#18040) 2026-02-15 11:24:41 +08:00
Xiaoyu Zhang
c29394e3c8 [kernel slimming] Move fast_hadamard_transform to jit_kernel (#18475) 2026-02-14 23:06:21 +08:00
Kangyan-Zhou
ae95869292 Enable SGLANG_ENABLE_SPEC_V2 for nightly speculative decoding tests (#18719)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 23:00:33 +08:00
Raayan Dhar
92cdd398cd feat: Support mrope_section with rope_type: "yarn" (#13313)
Signed-off-by: Raayan Dhar raayan.dhar@gmail.com <raayan.dhar@gmail.com>
Signed-off-by: raayandhar <raayan.dhar@gmail.com>
2026-02-14 22:51:44 +08:00
Ke Bao
f51e9d9ca1 Add ci test for ring model (#18829) 2026-02-14 22:20:23 +08:00
JD
f6c18c3a85 Fix/partial gen from waiting queue miss metadata (#17610) 2026-02-13 19:04:08 -08:00
Liangsheng Yin
dcea74d63f Add timeout abort kits for normal / eagle. (#18815) 2026-02-13 17:57:30 -08:00
Minglei Zhu
8be18c655d [Perf] refactor piecewise cuda graph support of Qwen3-Next (#17613) 2026-02-14 09:30:50 +08:00
shuwenn
3299c4f9c1 [CI] feat: add early exit to wait_for_server when process dies (#18602) 2026-02-13 16:46:09 -08:00