Prozac614
|
e3b581ce6b
|
[diffusion] fix: remove num_frames in wan2_1_t2v_1_3b_lora_1gpu test (#20009)
Co-authored-by: daiweitao <dwti614707404@163.com>
|
2026-03-06 21:36:43 +08:00 |
|
Kangyan-Zhou
|
25e678d933
|
[diffusion] endpoint: add /server_info and /model_info endpoints for gateway discovery (#20020)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-06 21:36:13 +08:00 |
|
inkcherry
|
84aaa69795
|
[AMD] Use bfloat16 for correction_bias in AITER FP8 path to avoid runtime dtype conversion for dsv3 (#19843)
|
2026-03-06 00:57:12 -08:00 |
|
Clint
|
27053aa5ed
|
Fix MLA decode path returning unwritten (padded) rows (#19902)
|
2026-03-06 00:54:29 -08:00 |
|
xdtbynd
|
0252ca8255
|
[Bugfix] Fix the bug blocking the startup of Llama-3.2-11b-Vision-Instruct (#19638)
Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
|
2026-03-06 16:21:50 +08:00 |
|
Zheng Wengang
|
da27d9bff6
|
[Bug-Fix][EPD]: skip log waiting-image-req for zmq_to_tokenzer/mooncake (#19555)
|
2026-03-06 14:39:22 +08:00 |
|
Mook
|
be9a9e4819
|
refactor(multimodal/test): centralize model names and shared utilities in test_utils (#19354)
Co-authored-by: Ratish P <114130421+Ratish1@users.noreply.github.com>
|
2026-03-05 20:09:42 -08:00 |
|
Baizhou Zhang
|
51e5dc845a
|
Revert "[Kernel Slimming] Migrate NVFP4 kernels to JIT" (#20005)
|
2026-03-05 19:40:00 -08:00 |
|
sushil Dubey
|
6e5a2de354
|
[diffusion] fix: fix reading multiple prompts from prompt file (#19075)
Signed-off-by: Sushil Dubey <sushil.dubey@intel.com>
|
2026-03-06 11:23:31 +08:00 |
|
Simo Lin
|
9502369488
|
fix(grpc): add server-side keepalive options to prevent GOAWAY (#19986)
Signed-off-by: Simo Lin <linsimo.mark@gmail.com>
|
2026-03-05 18:56:35 -08:00 |
|
liupeng374
|
5471e4a492
|
[NPU][Feature] eliminate dsv3 redundant rotary embed calculation (#19842)
|
2026-03-06 09:02:14 +08:00 |
|
chenxu214
|
b912d7ae19
|
[OPT]Skip the first delayer to maximize the BS of the decoding. (#19836)
|
2026-03-06 08:53:19 +08:00 |
|
shadowxz109
|
261be85ecc
|
Support mrope_position_delta cache
|
2026-03-06 08:50:53 +08:00 |
|
Xinyuan Tong
|
9ebffef1ef
|
[FIX] NSA backend page_table overflow in speculative decoding target_verify (#19016)
|
2026-03-05 16:04:58 -08:00 |
|
Ajay Anubolu
|
13af7cbb02
|
fix: use consistent time denominator for throughput metrics in bench_one_batch_server (#19223)
|
2026-03-05 15:58:17 -08:00 |
|
Chang Su
|
dd2bbe6d62
|
fix(grpc): use context.abort() with proper status codes instead of in-band errors (#19972)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
|
2026-03-05 14:53:18 -08:00 |
|
Qiaolin Yu
|
46dced64ea
|
Adjust padding size to improve triton_kernels moe performance (#19174)
|
2026-03-05 14:50:40 -08:00 |
|
kpham-sgl
|
346a4131cf
|
[Spec] Refactor NaN/OOB checks to async maybe_detect_* with env-var control (#19899)
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
|
2026-03-05 13:51:05 -08:00 |
|
Xinyu Zhang
|
b3cfad0a80
|
Add Ray actor support for scheduler process management (DP=1) (#17684)
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-03-05 13:21:23 -08:00 |
|
sglang-bot
|
ebb66cc1de
|
[misc] Priority scheduling metrics cleanup (#19927)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-05 12:42:42 -08:00 |
|
danielafrimi
|
ff6048fb9c
|
rename nemotron reasoning parser (#19865)
Signed-off-by: dafrimi <dafrimi@nvidia.com>
|
2026-03-05 11:27:07 -08:00 |
|
Mohammad Miadh Angkad
|
41fd53fe37
|
Fix profile_activities parameter name in bench_one_batch_server_internal.py (#19954)
|
2026-03-05 10:34:06 -08:00 |
|
akhilg-nv
|
73d272bddb
|
Revised fix for HybridAttnBackend forward for linear attn (#19369)
|
2026-03-06 00:05:35 +08:00 |
|
Zheng Wengang
|
0de0d74195
|
[EPD][Feat]support adaptive forward (#18118)
|
2026-03-05 21:12:30 +08:00 |
|
StonyPort
|
806d41ab65
|
[quant] fix fp32 downcasting (#19844)
Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
|
2026-03-05 17:54:59 +08:00 |
|
Rain Jiang
|
472eef4071
|
fa4 cleanup (#19727)
|
2026-03-05 17:54:25 +08:00 |
|
Chi McIsaac
|
c36de62bfc
|
[diffusion] fix images/edit with 2 images (#17520)
Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-05 16:56:39 +08:00 |
|
xingsy97
|
dbc896f204
|
[Test] Enhance JIT kvcache store kernel test coverage (#19630)
|
2026-03-05 16:17:15 +08:00 |
|
Tiwei Bie
|
727face6c2
|
[DLLM] Add initial radix cache support (#18724)
|
2026-03-04 23:24:09 -08:00 |
|
Kalyan Kumar
|
c1df359b44
|
Add XPU profiler activity support in benchmark code (#12981)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-04 23:22:56 -08:00 |
|
Mohammad Miadh Angkad
|
2bdd89a6cd
|
[Kernel Slimming] Migrate NVFP4 kernels to JIT (#19437)
|
2026-03-05 15:22:28 +08:00 |
|
Yilong Zhao
|
1bbfed0539
|
[misc] add env for http keep alive timeout (#19847)
|
2026-03-04 22:00:51 -08:00 |
|
Chenxi Li
|
86c5617787
|
[BUG]: fix prevent illegal memory access in Mamba SSM tracking during EAGLE speculative verification (#19415)
Co-authored-by: ConnorLi96 <ConnorLi96@users.noreply.github.com>
|
2026-03-04 21:13:21 -08:00 |
|
Baizhou Zhang
|
10c65df48a
|
[Bug] Fix lora tp bug on H200 (#19769)
|
2026-03-04 20:11:02 -08:00 |
|
Xinyi Song
|
0e6a64712a
|
[bugfix] Fix PPMissingLayer AttributeError when Using PP (#19804)
|
2026-03-04 19:48:15 -08:00 |
|
Kangyan-Zhou
|
198381d9ce
|
Add SSL/TLS support for HTTP and gRPC servers (#18973)
Co-authored-by: guys@spotify.com
|
2026-03-04 19:27:16 -08:00 |
|
Junhao Liu
|
9c11a7ae40
|
[diffusion] fix: fix the frame interpolation testcase in CI regarding number of frames (#19659)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-05 11:21:53 +08:00 |
|
R0CKSTAR
|
fc53307ce9
|
[diffusion] hardware: SiluAndMul/RMSNorm/LayerNorm MUSA implementations (custom ops, 12/N) (#18583)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Qingfu Wen <qingfu.wen@mthreads.com>
|
2026-03-05 11:10:57 +08:00 |
|
Xiaoyu Zhang
|
9795b4cd5b
|
[Diffusion] Open t5 encoder parallel folding for wan2.2 and mova video (#18493)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-05 10:18:00 +08:00 |
|
Ethan (Yusheng) Su
|
e555a6c171
|
[feat] Enhance lora_update_weight_from_tensor for RL training (#19314)
|
2026-03-04 18:10:42 -08:00 |
|
Shu Wang
|
43bdee703e
|
Fix Fp8 MTP layer a2a backend without EP. (#18515)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-03-04 16:28:10 -08:00 |
|
Liangsheng Yin
|
33c92732f4
|
[Triton] Use dynamic loop bound in alloc_extend_kernel (#19898)
|
2026-03-04 16:15:58 -08:00 |
|
rakesh
|
a710b7d791
|
[Sarvam] Add inference support for Sarvam MoE LLMs (#18938)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-04 15:28:00 -08:00 |
|
kpham-sgl
|
376dfb03f7
|
Fix issue 19717 by making qo_indptr uniform strided instead of packed (#19807)
|
2026-03-04 15:27:10 -08:00 |
|
zhuxinjie-nz
|
28c931e1a5
|
feat: Priority-based scheduling optimization (including default priority, preemption toggle, priority-based metrics, etc.) (#17026)
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
|
2026-03-04 14:52:08 -08:00 |
|
hlu1
|
9457c049e1
|
[Qwen3.5] Enable MTP spec_v2 and add test for nvidia/Qwen3.5-397B-A17B-NVFP4 (#19391)
|
2026-03-04 14:01:25 -08:00 |
|
Chang Su
|
0ee9d3c8e9
|
fix(grpc): send last chunk before completion during streaming (#19895)
|
2026-03-04 13:21:21 -08:00 |
|
Bingxu Chen
|
329817e262
|
[AMD] Move get_global_server_args import out of CUDA-only block to fix NameError on AMD (#19866)
|
2026-03-04 10:23:42 -08:00 |
|
Ken J
|
44208d2adf
|
[vlm][minicpm] support input formats of processor output and embedding (#19614)
|
2026-03-04 12:11:12 -05:00 |
|
Kangyan-Zhou
|
c03deb8175
|
Fix disagg PD bootstrap and KV transfer metrics (#19009)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-04 09:08:10 -08:00 |
|