fzyzcjy
|
cc63c99f11
|
Enhance hook mechanism in dumper (#19073)
|
2026-02-22 16:13:38 +08:00 |
|
fzyzcjy
|
fdf80b5031
|
Extract framework plugins in dumper (#19072)
|
2026-02-22 16:10:43 +08:00 |
|
fzyzcjy
|
e32b5364a2
|
Auto annotate context in dumper (#19071)
|
2026-02-22 16:08:48 +08:00 |
|
fzyzcjy
|
8bc0751376
|
Support enabling partial non intrusive dump in dumper (#19069)
|
2026-02-22 16:07:45 +08:00 |
|
fzyzcjy
|
0384c459a7
|
Support non-intrusive dumping in dumper (#19068)
|
2026-02-22 16:04:02 +08:00 |
|
fzyzcjy
|
5eccc3cff9
|
Refactor dumper and change on_forward_pass_start API (#19065)
|
2026-02-22 16:03:27 +08:00 |
|
Liangsheng Yin
|
1f2da824dd
|
[Benchmark] Remove re-exports from bench_serving.py (#19130)
|
2026-02-21 14:30:30 -08:00 |
|
Liangsheng Yin
|
3fb457b103
|
Tiny rename cuda_graph tests to piecewise_cuda_graph (#19128)
|
2026-02-21 13:46:01 -08:00 |
|
Xinyuan Tong
|
4a362a0e04
|
fix tool handling in OpenAIServingChat (#18996)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2026-02-21 22:07:09 +08:00 |
|
Xinyuan Tong
|
cc451671b5
|
[FEAT] Add Anthropic compatible API endpoint (#18630)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2026-02-21 19:37:38 +08:00 |
|
fzyzcjy
|
046ef0aa35
|
Support using SGLang port in dumper (#19038)
|
2026-02-20 12:30:24 +08:00 |
|
fzyzcjy
|
2fecc2c075
|
Support resetting and enhance HTTP endpoints for dumper (#19046)
Co-authored-by: Yueming Yuan <112649537+yueming-yuan@users.noreply.github.com>
|
2026-02-20 12:29:09 +08:00 |
|
fzyzcjy
|
503bf3047a
|
Enhance configure and env parsing in dumper (#19034)
|
2026-02-20 12:28:10 +08:00 |
|
fzyzcjy
|
df995aab56
|
Support filtering labels in dumper (#19018)
|
2026-02-20 12:27:12 +08:00 |
|
fzyzcjy
|
261bca3c58
|
Support captured dump output and console output control in dumper (#19017)
|
2026-02-20 12:26:24 +08:00 |
|
fzyzcjy
|
fc1500adc6
|
Hint users when wrongly execute it with partial ranks in dumper (#19014)
|
2026-02-20 12:25:54 +08:00 |
|
fzyzcjy
|
b41d412c3d
|
Support cleanup previous dumps in dumper (#19013)
|
2026-02-20 12:25:21 +08:00 |
|
Cheng Wan
|
73a7f0d049
|
Revert "Add SDAR model support" (#19032)
|
2026-02-19 16:03:56 -08:00 |
|
chengshuang18
|
44ab752b7a
|
Add SDAR model support (#18318)
Co-authored-by: root <root@gpu-lg-cmc-h-h200-3047.host.h.pjlab.org.cn>
Co-authored-by: chengshuang <chengshuang@pjlab.org.cn>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
|
2026-02-19 11:20:32 -08:00 |
|
satyamk7054
|
963def7f26
|
Move lora request validation to tokenizer_manager from server (#18962)
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
|
2026-02-19 21:03:19 +08:00 |
|
pansicheng
|
48642d5384
|
[RadixTree][4/N Refactor]: Move available_and_evictable_str to individual radix cache classes (#17852)
|
2026-02-19 17:03:15 +08:00 |
|
shaharmor98
|
82a0bafc1c
|
Feat/add fi selective state update kernel call (#18070)
Signed-off-by: Shahar Mor <smor@nvidia.com>
|
2026-02-19 16:56:06 +08:00 |
|
Yuwei An
|
0be30d4b0d
|
Fix PCG MoE Error (#17739)
|
2026-02-19 16:48:06 +08:00 |
|
Ethan (Yusheng) Su
|
9c5aae4df5
|
[Fix] Add lora tied lm head support (for Qwen2.5, Gemma, etc model need) (#18634)
|
2026-02-19 00:34:51 +08:00 |
|
Alison Shao
|
34d975b18f
|
Fix eval tests not capturing server launch failures (#18886)
|
2026-02-18 07:59:03 +08:00 |
|
Liangsheng Yin
|
83a475e8d7
|
feat: add cuda core dump CI warpper (#18909)
|
2026-02-17 14:49:26 -08:00 |
|
Minglei Zhu
|
bf52388354
|
[PCG] support piecewise cuda graph for kimi-linear model (#18849)
|
2026-02-17 23:31:12 +08:00 |
|
Alison Shao
|
7e41ac6c8d
|
Skip flaky test_tool_choice_required_non_streaming for Mistral (#18889)
|
2026-02-17 12:50:55 +08:00 |
|
Shivam jindal
|
4f0409f8aa
|
[Model] Add Qwen3ForRewardModel and fix Qwen3ForSequenceClassification (#17992)
Co-authored-by: yes-its-shivam <yes-its-shivam@users.noreply.github.com>
|
2026-02-16 19:44:41 +08:00 |
|
fzyzcjy
|
f554b3c27b
|
Support dumping gradients, parameters, lazy values (#18881)
Co-authored-by: Yueming Yuan <112649537+yueming-yuan@users.noreply.github.com>
|
2026-02-16 13:34:06 +08:00 |
|
fzyzcjy
|
9a7d8d5eb0
|
Collect upper level metadata to dump output (#18880)
|
2026-02-16 13:31:19 +08:00 |
|
fzyzcjy
|
949792d0c6
|
Change dump output format to dict with value and metadata (#18879)
|
2026-02-16 13:30:47 +08:00 |
|
fzyzcjy
|
02816abc0d
|
Flip dumper to disable by default and refactor environment handling (#18878)
|
2026-02-16 13:29:32 +08:00 |
|
Rain Jiang
|
0ffd0a3995
|
Nsa trtllm mla sparse fp8 support with Deepseek v3.2 NVFP4 (#18389)
|
2026-02-16 09:29:54 +08:00 |
|
Mohammad Miadh Angkad
|
8290171f52
|
[CI] Remove --mem-fraction-static 0.93 from gpt-oss test (#18869)
|
2026-02-16 09:24:11 +08:00 |
|
Chanh Nguyen
|
597d17dd18
|
Use ephemeral nccl port via get_free_port() (#18009)
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>
|
2026-02-16 00:32:47 +08:00 |
|
Zack Yu
|
536ed3143b
|
test: add test for Modelopt FP8 on SM90 (#18463)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-16 00:29:37 +08:00 |
|
SoluMilken
|
07a24f1a38
|
update pre-commit config (#18860)
|
2026-02-16 00:18:31 +08:00 |
|
Michael
|
88010e9601
|
[AMD] Fix nightly 1-GPU test failures and bench_serving regression (#18761)
Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>
|
2026-02-15 20:36:47 +08:00 |
|
fzyzcjy
|
90555a0228
|
Add missing dumper tests (#18859)
|
2026-02-15 18:42:57 +08:00 |
|
fzyzcjy
|
4c7f986c6b
|
Extract dumper and prefill delayer tests common utils (#18857)
|
2026-02-15 18:33:23 +08:00 |
|
Bhavneek Singh
|
1ce3420784
|
Model: Support IBM Granite (Dense/Mamba + MoE) (#18040)
|
2026-02-15 11:24:41 +08:00 |
|
Xiaoyu Zhang
|
c29394e3c8
|
[kernel slimming] Move fast_hadamard_transform to jit_kernel (#18475)
|
2026-02-14 23:06:21 +08:00 |
|
Kangyan-Zhou
|
ae95869292
|
Enable SGLANG_ENABLE_SPEC_V2 for nightly speculative decoding tests (#18719)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-14 23:00:33 +08:00 |
|
Raayan Dhar
|
92cdd398cd
|
feat: Support mrope_section with rope_type: "yarn" (#13313)
Signed-off-by: Raayan Dhar raayan.dhar@gmail.com <raayan.dhar@gmail.com>
Signed-off-by: raayandhar <raayan.dhar@gmail.com>
|
2026-02-14 22:51:44 +08:00 |
|
Ke Bao
|
f51e9d9ca1
|
Add ci test for ring model (#18829)
|
2026-02-14 22:20:23 +08:00 |
|
JD
|
f6c18c3a85
|
Fix/partial gen from waiting queue miss metadata (#17610)
|
2026-02-13 19:04:08 -08:00 |
|
Liangsheng Yin
|
dcea74d63f
|
Add timeout abort kits for normal / eagle. (#18815)
|
2026-02-13 17:57:30 -08:00 |
|
Minglei Zhu
|
8be18c655d
|
[Perf] refactor piecewise cuda graph support of Qwen3-Next (#17613)
|
2026-02-14 09:30:50 +08:00 |
|
shuwenn
|
3299c4f9c1
|
[CI] feat: add early exit to wait_for_server when process dies (#18602)
|
2026-02-13 16:46:09 -08:00 |
|