sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 04:08:10 +00:00

Author	SHA1	Message	Date
shuwenn	18074e25dc	fix: scheduler launch hang when non-current rank dies (#20287 )	2026-03-29 00:28:45 -07:00
Simon (Jiyou) Li	22e4733ab9	Add subprocess liveness monitor to detect scheduler crashes (#18582 ) Co-authored-by: 继优 <jiyou.ljy@alibaba-inc.com> Co-authored-by: shuwenn <47200617+alphabetc1@users.noreply.github.com>	2026-03-29 00:09:13 -07:00
Kangyan-Zhou	9d64a82173	feat(ci): add GB300 nightly benchmark test suites (#21487 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 21:54:03 -07:00
Lianmin Zheng	ba6b501f3a	Clean up detokenizer and remove dead multimodal_gen code (#21588 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 21:44:40 -07:00
Xiaoyu Zhang	516cff97a3	[Diffusion] Align diffusion benchmark skill presets with nightly comparison cases (#21616 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-29 12:12:17 +08:00
Yuan Luo	343a7ac652	[GDN] Fuse GDN kkt + solve_tril into one kernel (#21411 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-03-29 12:02:07 +08:00
jacky.cheng	c86f6c2831	[AMD] Add peft>=0.18.0 to diffusion_hip deps for transformers 5.x compat for AMD diffusion model (#21442 ) Co-authored-by: HaiShaw <hixiao@gmail.com>	2026-03-28 20:28:05 -07:00
Yuhao Yang	4e69f14b95	fix bench_serving sglang backend to support image dataset (#21294 )	2026-03-29 10:02:11 +08:00
eigen	3ab9afd653	fix: piecewise_cuda_graph get correct qo_indptr (#21452 ) Co-authored-by: Avery Huang <averyh@nvidia.com>	2026-03-28 15:57:29 -07:00
Shu Wang	efebcab43e	Support skip-softmax attention (#19089 )	2026-03-28 15:55:48 -07:00
Xinyuan Tong	ced69c9f84	feat: enable CUDA graph and timestamp for the whisper model(#21190 )	2026-03-29 01:46:03 +08:00
Yuhao Yang	57cf4790ca	[VLM] Optimize ShmPointerMMData for multi-pickle safety and deferred unwrap (#21465 )	2026-03-28 23:11:12 +08:00
Mick	fc9de157f9	[diffusion] feat: support overlay model materialization (#21600 )	2026-03-28 23:02:38 +08:00
Aditya Sharma	627e162335	[diffusion] fix: fix Flux2-Klein prompt tokenization length to 512 and add regression coverage (#21407 )	2026-03-28 17:28:02 +08:00
Baizhou Zhang	edd4d54023	[Clean] Remove deprecated environs (#21536 )	2026-03-28 00:35:44 -07:00
Liangsheng Yin	402628e560	Patch transformers is_base_mistral in CI to avoid HF 429 rate limiting (#21586 )	2026-03-27 22:19:36 -07:00
Jianying	daf02bde33	Fix Piecewise CUDA Graph crash with `-enable-mixed-chunk` (#20441 ) Co-authored-by: jianyingzhu <joeyzhu@nvidia.com>	2026-03-27 21:56:21 -07:00
Liangsheng Yin	19b1f75186	Fix HFRunner hang when subprocess dies during init (#21582 )	2026-03-27 21:22:42 -07:00
Yuhao Yang	5ef56682b8	reduce CPU peak memory in multimodal tensor hashing (#21123 )	2026-03-28 11:09:16 +08:00
Fengyuan Yu	9fa7b974fd	[diffusion] chore: remove redundant identity preprocess_text functions(#20633 ) Co-authored-by: Fengyuan Yu <15fengyuan@gmail.com>	2026-03-28 10:07:30 +08:00
Eitan Turok	e570ca96f6	[diffusion] refactor: Unify `TeaCacheParams` and `WanTeaCacheParams` (#20706 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-28 09:51:44 +08:00
Mick	f0c68fbefd	[diffusion] UX: aggregate expected dtype-cast logs during weight loading (#21552 )	2026-03-28 09:50:40 +08:00
Trevor Morris	7160b6cb76	[NVIDIA] Enable automatic NUMA configuration (#19452 )	2026-03-27 18:44:13 -07:00
Vladislav Nosivskoy	c37200f5e4	Scope streaming backlog coalescing to incremental_streaming_output mode (#21037 ) Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2026-03-27 17:29:54 -07:00
Qiaolin Yu	a27651d5e0	Remove sync when enabling return_logprob (#20972 )	2026-03-27 16:36:28 -07:00
Ethan (Yusheng) Su	6d48719e31	[1/n] lora support - Auto detect lora target modules (#21439 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-03-27 16:08:36 -07:00
narutolhy	9b29131961	fix tp capture in vit cuda graph (#17255 )	2026-03-27 22:38:18 +00:00
Muqi Li	38ad251738	feat: add gc_threshold arg (#21481 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-27 13:42:46 -07:00
huangtingwei	d864622a68	[Hicache & JIT_kernel] Support page first layout & mla jit kernel (#18311 )	2026-03-27 08:54:36 -07:00
Bi Xue	30397e0a1e	[rl][sgl] fix tensor mismatch after pause (#21514 )	2026-03-27 23:02:30 +08:00
yang1002378395-cmyk	279e7738c5	[diffusion] fix: return None instead of raising RuntimeError when no model info found (#21319 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-27 22:42:39 +08:00
Xiaoyu Zhang	9238bd08a2	[CI] Register missing jit_kernel test files (#21547 )	2026-03-27 19:39:08 +08:00
yang1002378395-cmyk	f83b1b73a8	[diffusion] feat: add --strict-ports option for predictable port assignment (#21320 ) Co-authored-by: 阳虎 <yanghu@yanghudeMacBook-Pro.local>	2026-03-27 16:40:50 +08:00
zwang86	5fc5c18bed	fix(security): replace unsafe pickle.loads with SafeUnpickler for CVE-2026-3989 (#20904 )	2026-03-27 00:43:41 -07:00
Khoa Pham	8d4fca5908	[Security] 1/N: Bind ZMQ sockets to localhost to prevent unauthenticated remote access (#21435 )	2026-03-26 23:33:49 -07:00
Xiaoyu Zhang	d633ab7349	[Diffusion] Add qknorm rope fuse kernel (#21440 )	2026-03-27 14:27:08 +08:00
Xiaoyu Zhang	e8d46f145c	Opt jit qknorm_across_heads cuda kernel (#21503 )	2026-03-27 13:30:46 +08:00
Johnsonms	8a56a7b04d	[jit_kernel] Migrate cast (downcast_fp8) from sgl-kernel AOT to JIT (#19103 )	2026-03-27 13:21:44 +08:00
Johnsonms	c531be455e	[jit_kernel] Add fused_qknorm_rope JIT kernel (#19059 ) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2026-03-27 13:21:28 +08:00
Mick	d7c4c57ace	[diffusion] refactor: move format-specific weight loading hooks (quant-related) to a dedicated file (#21366 )	2026-03-27 09:58:49 +08:00
Liangsheng Yin	e1ee68d0fc	Release mm features on session close and support multiple /rerun-ut specs (#21501 )	2026-03-26 18:31:29 -07:00
Aurick Qiao	c2b3e42ad6	Fix sessions with mm inputs (#21269 )	2026-03-26 17:38:23 -07:00
Liangsheng Yin	8a4cdcd538	Simplify flush_cache: reject concurrent requests, remove client-side retry (#21490 )	2026-03-26 16:31:04 -07:00
Liangsheng Yin	c580ddd19d	Fix benchmark generating empty prompts when random_input_len is small (#21492 )	2026-03-26 16:24:35 -07:00
Baizhou Zhang	a93065679b	Revert "bugfix for weight loading for qwen3-next" (#21496 )	2026-03-26 16:17:18 -07:00
SevenJ	2e65c27b29	Api add flush cache timeout (#21413 ) Signed-off-by: root <wenjun7j@gmail.com>	2026-03-26 14:44:37 -07:00
Qiaolin Yu	8c3ccef2d9	Fix Kimi K2.5 dp attention+ spec decoding launch crash (#21391 )	2026-03-26 14:40:26 -07:00
satyamk7054	be0cca5596	Use torch.addmm instead of separate mm and add_ calls for LoRA torch.native (#20562 ) Co-authored-by: Satyam Kumar <satyamk@linkedin.com>	2026-03-26 14:35:20 -07:00
satyamk7054	e59ea4f6e9	fix: torch-native LoRA for multi-adapter case (#20564 ) Co-authored-by: Satyam Kumar <satyamk@linkedin.com>	2026-03-26 14:34:16 -07:00
Liangsheng Yin	fb90c9d298	[Test] Consolidate eval accuracy test mixins into eval_accuracy_kit (#21047 )	2026-03-26 14:26:46 -07:00

... 10 11 12 13 14 ...

7855 Commits