Commit Graph

7855 Commits

Author SHA1 Message Date
Prozac614
e3b581ce6b [diffusion] fix: remove num_frames in wan2_1_t2v_1_3b_lora_1gpu test (#20009)
Co-authored-by: daiweitao <dwti614707404@163.com>
2026-03-06 21:36:43 +08:00
Kangyan-Zhou
25e678d933 [diffusion] endpoint: add /server_info and /model_info endpoints for gateway discovery (#20020)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 21:36:13 +08:00
inkcherry
84aaa69795 [AMD] Use bfloat16 for correction_bias in AITER FP8 path to avoid runtime dtype conversion for dsv3 (#19843) 2026-03-06 00:57:12 -08:00
Clint
27053aa5ed Fix MLA decode path returning unwritten (padded) rows (#19902) 2026-03-06 00:54:29 -08:00
xdtbynd
0252ca8255 [Bugfix] Fix the bug blocking the startup of Llama-3.2-11b-Vision-Instruct (#19638)
Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
2026-03-06 16:21:50 +08:00
Zheng Wengang
da27d9bff6 [Bug-Fix][EPD]: skip log waiting-image-req for zmq_to_tokenzer/mooncake (#19555) 2026-03-06 14:39:22 +08:00
Mook
be9a9e4819 refactor(multimodal/test): centralize model names and shared utilities in test_utils (#19354)
Co-authored-by: Ratish P <114130421+Ratish1@users.noreply.github.com>
2026-03-05 20:09:42 -08:00
Baizhou Zhang
51e5dc845a Revert "[Kernel Slimming] Migrate NVFP4 kernels to JIT" (#20005) 2026-03-05 19:40:00 -08:00
sushil Dubey
6e5a2de354 [diffusion] fix: fix reading multiple prompts from prompt file (#19075)
Signed-off-by: Sushil Dubey <sushil.dubey@intel.com>
2026-03-06 11:23:31 +08:00
Simo Lin
9502369488 fix(grpc): add server-side keepalive options to prevent GOAWAY (#19986)
Signed-off-by: Simo Lin <linsimo.mark@gmail.com>
2026-03-05 18:56:35 -08:00
liupeng374
5471e4a492 [NPU][Feature] eliminate dsv3 redundant rotary embed calculation (#19842) 2026-03-06 09:02:14 +08:00
chenxu214
b912d7ae19 [OPT]Skip the first delayer to maximize the BS of the decoding. (#19836) 2026-03-06 08:53:19 +08:00
shadowxz109
261be85ecc Support mrope_position_delta cache 2026-03-06 08:50:53 +08:00
Xinyuan Tong
9ebffef1ef [FIX] NSA backend page_table overflow in speculative decoding target_verify (#19016) 2026-03-05 16:04:58 -08:00
Ajay Anubolu
13af7cbb02 fix: use consistent time denominator for throughput metrics in bench_one_batch_server (#19223) 2026-03-05 15:58:17 -08:00
Chang Su
dd2bbe6d62 fix(grpc): use context.abort() with proper status codes instead of in-band errors (#19972)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
2026-03-05 14:53:18 -08:00
Qiaolin Yu
46dced64ea Adjust padding size to improve triton_kernels moe performance (#19174) 2026-03-05 14:50:40 -08:00
kpham-sgl
346a4131cf [Spec] Refactor NaN/OOB checks to async maybe_detect_* with env-var control (#19899)
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
2026-03-05 13:51:05 -08:00
Xinyu Zhang
b3cfad0a80 Add Ray actor support for scheduler process management (DP=1) (#17684)
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-03-05 13:21:23 -08:00
sglang-bot
ebb66cc1de [misc] Priority scheduling metrics cleanup (#19927)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 12:42:42 -08:00
danielafrimi
ff6048fb9c rename nemotron reasoning parser (#19865)
Signed-off-by: dafrimi <dafrimi@nvidia.com>
2026-03-05 11:27:07 -08:00
Mohammad Miadh Angkad
41fd53fe37 Fix profile_activities parameter name in bench_one_batch_server_internal.py (#19954) 2026-03-05 10:34:06 -08:00
akhilg-nv
73d272bddb Revised fix for HybridAttnBackend forward for linear attn (#19369) 2026-03-06 00:05:35 +08:00
Zheng Wengang
0de0d74195 [EPD][Feat]support adaptive forward (#18118) 2026-03-05 21:12:30 +08:00
StonyPort
806d41ab65 [quant] fix fp32 downcasting (#19844)
Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
2026-03-05 17:54:59 +08:00
Rain Jiang
472eef4071 fa4 cleanup (#19727) 2026-03-05 17:54:25 +08:00
Chi McIsaac
c36de62bfc [diffusion] fix images/edit with 2 images (#17520)
Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-05 16:56:39 +08:00
xingsy97
dbc896f204 [Test] Enhance JIT kvcache store kernel test coverage (#19630) 2026-03-05 16:17:15 +08:00
Tiwei Bie
727face6c2 [DLLM] Add initial radix cache support (#18724) 2026-03-04 23:24:09 -08:00
Kalyan Kumar
c1df359b44 Add XPU profiler activity support in benchmark code (#12981)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-04 23:22:56 -08:00
Mohammad Miadh Angkad
2bdd89a6cd [Kernel Slimming] Migrate NVFP4 kernels to JIT (#19437) 2026-03-05 15:22:28 +08:00
Yilong Zhao
1bbfed0539 [misc] add env for http keep alive timeout (#19847) 2026-03-04 22:00:51 -08:00
Chenxi Li
86c5617787 [BUG]: fix prevent illegal memory access in Mamba SSM tracking during EAGLE speculative verification (#19415)
Co-authored-by: ConnorLi96 <ConnorLi96@users.noreply.github.com>
2026-03-04 21:13:21 -08:00
Baizhou Zhang
10c65df48a [Bug] Fix lora tp bug on H200 (#19769) 2026-03-04 20:11:02 -08:00
Xinyi Song
0e6a64712a [bugfix] Fix PPMissingLayer AttributeError when Using PP (#19804) 2026-03-04 19:48:15 -08:00
Kangyan-Zhou
198381d9ce Add SSL/TLS support for HTTP and gRPC servers (#18973)
Co-authored-by: guys@spotify.com
2026-03-04 19:27:16 -08:00
Junhao Liu
9c11a7ae40 [diffusion] fix: fix the frame interpolation testcase in CI regarding number of frames (#19659)
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-05 11:21:53 +08:00
R0CKSTAR
fc53307ce9 [diffusion] hardware: SiluAndMul/RMSNorm/LayerNorm MUSA implementations (custom ops, 12/N) (#18583)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Qingfu Wen <qingfu.wen@mthreads.com>
2026-03-05 11:10:57 +08:00
Xiaoyu Zhang
9795b4cd5b [Diffusion] Open t5 encoder parallel folding for wan2.2 and mova video (#18493)
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-05 10:18:00 +08:00
Ethan (Yusheng) Su
e555a6c171 [feat] Enhance lora_update_weight_from_tensor for RL training (#19314) 2026-03-04 18:10:42 -08:00
Shu Wang
43bdee703e Fix Fp8 MTP layer a2a backend without EP. (#18515)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-03-04 16:28:10 -08:00
Liangsheng Yin
33c92732f4 [Triton] Use dynamic loop bound in alloc_extend_kernel (#19898) 2026-03-04 16:15:58 -08:00
rakesh
a710b7d791 [Sarvam] Add inference support for Sarvam MoE LLMs (#18938)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-04 15:28:00 -08:00
kpham-sgl
376dfb03f7 Fix issue 19717 by making qo_indptr uniform strided instead of packed (#19807) 2026-03-04 15:27:10 -08:00
zhuxinjie-nz
28c931e1a5 feat: Priority-based scheduling optimization (including default priority, preemption toggle, priority-based metrics, etc.) (#17026)
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
2026-03-04 14:52:08 -08:00
hlu1
9457c049e1 [Qwen3.5] Enable MTP spec_v2 and add test for nvidia/Qwen3.5-397B-A17B-NVFP4 (#19391) 2026-03-04 14:01:25 -08:00
Chang Su
0ee9d3c8e9 fix(grpc): send last chunk before completion during streaming (#19895) 2026-03-04 13:21:21 -08:00
Bingxu Chen
329817e262 [AMD] Move get_global_server_args import out of CUDA-only block to fix NameError on AMD (#19866) 2026-03-04 10:23:42 -08:00
Ken J
44208d2adf [vlm][minicpm] support input formats of processor output and embedding (#19614) 2026-03-04 12:11:12 -05:00
Kangyan-Zhou
c03deb8175 Fix disagg PD bootstrap and KV transfer metrics (#19009)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 09:08:10 -08:00