sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 12:17:09 +00:00

Author	SHA1	Message	Date
Shu Wang	efebcab43e	Support skip-softmax attention (#19089 )	2026-03-28 15:55:48 -07:00
Xinyuan Tong	ced69c9f84	feat: enable CUDA graph and timestamp for the whisper model(#21190 )	2026-03-29 01:46:03 +08:00
Yuhao Yang	57cf4790ca	[VLM] Optimize ShmPointerMMData for multi-pickle safety and deferred unwrap (#21465 )	2026-03-28 23:11:12 +08:00
Mick	fc9de157f9	[diffusion] feat: support overlay model materialization (#21600 )	2026-03-28 23:02:38 +08:00
Aditya Sharma	627e162335	[diffusion] fix: fix Flux2-Klein prompt tokenization length to 512 and add regression coverage (#21407 )	2026-03-28 17:28:02 +08:00
Baizhou Zhang	edd4d54023	[Clean] Remove deprecated environs (#21536 )	2026-03-28 00:35:44 -07:00
Liangsheng Yin	402628e560	Patch transformers is_base_mistral in CI to avoid HF 429 rate limiting (#21586 )	2026-03-27 22:19:36 -07:00
Jianying	daf02bde33	Fix Piecewise CUDA Graph crash with `-enable-mixed-chunk` (#20441 ) Co-authored-by: jianyingzhu <joeyzhu@nvidia.com>	2026-03-27 21:56:21 -07:00
Liangsheng Yin	19b1f75186	Fix HFRunner hang when subprocess dies during init (#21582 )	2026-03-27 21:22:42 -07:00
Yuhao Yang	5ef56682b8	reduce CPU peak memory in multimodal tensor hashing (#21123 )	2026-03-28 11:09:16 +08:00
Fengyuan Yu	9fa7b974fd	[diffusion] chore: remove redundant identity preprocess_text functions(#20633 ) Co-authored-by: Fengyuan Yu <15fengyuan@gmail.com>	2026-03-28 10:07:30 +08:00
Eitan Turok	e570ca96f6	[diffusion] refactor: Unify `TeaCacheParams` and `WanTeaCacheParams` (#20706 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-28 09:51:44 +08:00
Mick	f0c68fbefd	[diffusion] UX: aggregate expected dtype-cast logs during weight loading (#21552 )	2026-03-28 09:50:40 +08:00
Trevor Morris	7160b6cb76	[NVIDIA] Enable automatic NUMA configuration (#19452 )	2026-03-27 18:44:13 -07:00
Vladislav Nosivskoy	c37200f5e4	Scope streaming backlog coalescing to incremental_streaming_output mode (#21037 ) Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2026-03-27 17:29:54 -07:00
Qiaolin Yu	a27651d5e0	Remove sync when enabling return_logprob (#20972 )	2026-03-27 16:36:28 -07:00
Ethan (Yusheng) Su	6d48719e31	[1/n] lora support - Auto detect lora target modules (#21439 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-03-27 16:08:36 -07:00
narutolhy	9b29131961	fix tp capture in vit cuda graph (#17255 )	2026-03-27 22:38:18 +00:00
Muqi Li	38ad251738	feat: add gc_threshold arg (#21481 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-27 13:42:46 -07:00
huangtingwei	d864622a68	[Hicache & JIT_kernel] Support page first layout & mla jit kernel (#18311 )	2026-03-27 08:54:36 -07:00
Bi Xue	30397e0a1e	[rl][sgl] fix tensor mismatch after pause (#21514 )	2026-03-27 23:02:30 +08:00
yang1002378395-cmyk	279e7738c5	[diffusion] fix: return None instead of raising RuntimeError when no model info found (#21319 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-27 22:42:39 +08:00
Xiaoyu Zhang	9238bd08a2	[CI] Register missing jit_kernel test files (#21547 )	2026-03-27 19:39:08 +08:00
yang1002378395-cmyk	f83b1b73a8	[diffusion] feat: add --strict-ports option for predictable port assignment (#21320 ) Co-authored-by: 阳虎 <yanghu@yanghudeMacBook-Pro.local>	2026-03-27 16:40:50 +08:00
zwang86	5fc5c18bed	fix(security): replace unsafe pickle.loads with SafeUnpickler for CVE-2026-3989 (#20904 )	2026-03-27 00:43:41 -07:00
Khoa Pham	8d4fca5908	[Security] 1/N: Bind ZMQ sockets to localhost to prevent unauthenticated remote access (#21435 )	2026-03-26 23:33:49 -07:00
Xiaoyu Zhang	d633ab7349	[Diffusion] Add qknorm rope fuse kernel (#21440 )	2026-03-27 14:27:08 +08:00
Xiaoyu Zhang	e8d46f145c	Opt jit qknorm_across_heads cuda kernel (#21503 )	2026-03-27 13:30:46 +08:00
Johnsonms	8a56a7b04d	[jit_kernel] Migrate cast (downcast_fp8) from sgl-kernel AOT to JIT (#19103 )	2026-03-27 13:21:44 +08:00
Johnsonms	c531be455e	[jit_kernel] Add fused_qknorm_rope JIT kernel (#19059 ) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2026-03-27 13:21:28 +08:00
Mick	d7c4c57ace	[diffusion] refactor: move format-specific weight loading hooks (quant-related) to a dedicated file (#21366 )	2026-03-27 09:58:49 +08:00
Liangsheng Yin	e1ee68d0fc	Release mm features on session close and support multiple /rerun-ut specs (#21501 )	2026-03-26 18:31:29 -07:00
Aurick Qiao	c2b3e42ad6	Fix sessions with mm inputs (#21269 )	2026-03-26 17:38:23 -07:00
Liangsheng Yin	8a4cdcd538	Simplify flush_cache: reject concurrent requests, remove client-side retry (#21490 )	2026-03-26 16:31:04 -07:00
Liangsheng Yin	c580ddd19d	Fix benchmark generating empty prompts when random_input_len is small (#21492 )	2026-03-26 16:24:35 -07:00
Baizhou Zhang	a93065679b	Revert "bugfix for weight loading for qwen3-next" (#21496 )	2026-03-26 16:17:18 -07:00
SevenJ	2e65c27b29	Api add flush cache timeout (#21413 ) Signed-off-by: root <wenjun7j@gmail.com>	2026-03-26 14:44:37 -07:00
Qiaolin Yu	8c3ccef2d9	Fix Kimi K2.5 dp attention+ spec decoding launch crash (#21391 )	2026-03-26 14:40:26 -07:00
satyamk7054	be0cca5596	Use torch.addmm instead of separate mm and add_ calls for LoRA torch.native (#20562 ) Co-authored-by: Satyam Kumar <satyamk@linkedin.com>	2026-03-26 14:35:20 -07:00
satyamk7054	e59ea4f6e9	fix: torch-native LoRA for multi-adapter case (#20564 ) Co-authored-by: Satyam Kumar <satyamk@linkedin.com>	2026-03-26 14:34:16 -07:00
Liangsheng Yin	fb90c9d298	[Test] Consolidate eval accuracy test mixins into eval_accuracy_kit (#21047 )	2026-03-26 14:26:46 -07:00
Liangsheng Yin	e5b7650353	Fix UnboundLocalError when DetokenizerManager constructor fails (#21471 )	2026-03-26 13:00:16 -07:00
Ho-Ren (Jack) Chuang	4b5f63e1b8	FIX: (NSA) Compute topk_indices_offset when NSA prefill flashmla_sparse is used with FP8 KV cache (#20606 ) Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>	2026-03-26 12:50:50 -07:00
jianzhao-xu	3867c6431a	Fix bug in dbrx model (#21445 ) Co-authored-by: Jianzhao Xu <xujianchao@huawei.com>	2026-03-26 11:23:30 -07:00
shuwenn	646573e4e8	fix: use get_rope_config() to support models without rope_parameters (#21135 )	2026-03-26 11:22:12 -07:00
McZyWu	0906e45cec	bugfix for weight loading for qwen3-next (#21313 )	2026-03-26 21:21:00 +08:00
Mick	35720d9969	[diffusion] fix: fix qwen-image with nunchaku (#21415 )	2026-03-26 16:31:44 +08:00
Anant Sharma	f289d173aa	[Deps] Bump xgrammar to 0.1.32 (#21032 )	2026-03-26 01:22:37 -07:00
Chen, Zhentao	fd535942ac	[AMD]Integrate aiter's fused_topk for softmax scoring in topk function (#21421 ) Co-authored-by: Chen, Todd <zhenchen@amd.com>	2026-03-26 00:57:56 -07:00
R0CKSTAR	a305964159	[MLX] Add native MLX execution backend for Apple Silicon Mac (#20342 ) Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>	2026-03-26 00:09:17 -07:00

1 2 3 4 5 ...

7296 Commits