sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 20:27:57 +00:00

Author	SHA1	Message	Date
elvischenv	205f041e96	Add Mistral Large 3 Eagle Support (#14466 ) Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-12-05 23:11:41 +08:00
Simo Lin	7235a7fbe9	[misc] add model arch and type to server info and use it for harmony (#14456 )	2025-12-05 06:51:00 -08:00
Yuxuan Zhang	8fce9e7b2a	support GLM-V vision model dp (#14097 )	2025-12-05 21:03:54 +08:00
Xiaoyu Zhang	5347732219	[diffusion] fix: Fix profiler trace missing Python stack in diffusion pipeline (#14499 )	2025-12-05 12:12:35 +00:00
roikoren755	2ce121a1c3	Enable RadixCache for Mamba2 models (#13584 )	2025-12-05 18:23:58 +08:00
WenhaoZhang	35ba6fe19e	[diffusion] fix: fix CLIP text encoder attention mask not used (#14364 ) Co-authored-by: niehen6174 <niehen.6174@gmail.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2025-12-05 16:30:10 +08:00
GMI Xiao Jin	7c744d137d	[diffusion] cli: add argument --adjust-frames and --override-protected-fields (#13996 ) Co-authored-by: dev <devnull@example.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2025-12-05 15:32:06 +08:00
zyksir	46b05ef58f	[diffusion] fix: fix bug about pin memory when offloading (#14472 )	2025-12-05 15:26:30 +08:00
Mick	beec8eed6a	[diffusion] chore: further improve model searching logic (#14484 )	2025-12-05 15:04:55 +08:00
Minglei Zhu	b76e303e6a	clean up gemlite usage (#14444 )	2025-12-04 21:52:56 -08:00
Yinghai Lu	41429a8c10	[ez] Fix typing (#14473 )	2025-12-05 12:23:13 +08:00
zyksir	fa0ca97694	[diffusion] improve: further optimize model load (#13836 )	2025-12-05 10:45:20 +08:00
Junrong Lin	2ecee7571c	[Bug] fix not desired disable fused share experts caused by rocm logic (#14432 )	2025-12-05 09:55:07 +08:00
Xinyuan Tong	6d37e70883	ministral3 (#14251 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Yueming Yuan <yy28@illinois.edu>	2025-12-04 14:31:26 -08:00
Sam	922756aaa1	[FIX] trtllm-moe-fp4-renorm for Qwen series models (#14350 )	2025-12-04 12:52:21 -08:00
YAMY	7dfcc78155	[DeepseekV3.2][NSA][Indexer] Fix PAGED top-k transform for NSA indexer chunked execution on H200 (#14325 )	2025-12-04 10:25:03 -08:00
Cherry_ming	1808df48fe	[NPU]add nightly-test-npu (#14143 )	2025-12-05 00:43:35 +08:00
WenhaoZhang	788628b56f	[diffusion] feat: Add Configurable Generator Device and Seed Support via API (#14366 ) Co-authored-by: niehen6174 <niehen.6174@gmail.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2025-12-05 00:25:09 +08:00
Raul Torres	29a2d4b59f	Add 'NPU' to the runtime exception message in `get_device` (#14225 ) Co-authored-by: ronnie_zheng <zl19940307@163.com>	2025-12-04 17:34:31 +03:00
R0CKSTAR	079ac237da	[diffusion] fix: fix gen video doc (#14409 ) Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2025-12-04 22:05:38 +08:00
Daniel Cámpora	8428078436	Add Mistral Large 3 support. (#14213 ) Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-12-04 20:00:05 +08:00
Xuchun Shang	af35023e65	[bug fix] fix ima with get_mla_kv_buffer_kernel overflow (#14224 ) Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>	2025-12-04 01:20:11 -08:00
jianan-gu	70d2587324	[CPU] Optimize small oc GEMM for Qwen3-next on CPU (#12446 ) Co-authored-by: Zheng, Beilei <beilei.zheng@intel.com>	2025-12-04 00:38:47 -08:00
Even Zhou	894c0dc57c	[NPU][1/N] NPU basic functions refactor and new modelslim quant type (#13359 )	2025-12-04 16:15:31 +08:00
yctseng0211	d6c490192d	[AMD] fix the regression issue for DeepseekV3 on MI300 (#14383 )	2025-12-03 23:30:11 -08:00
Yuan Luo	b2b09f5f24	[VLM] Introduce Cache for positional embedding ids for Qwen-VL family (#14292 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-12-04 12:32:00 +08:00
Kevin Li	04df80a9a1	Support PP x PD decode with nixl backend (#14392 )	2025-12-04 12:22:17 +08:00
b8zhong	9d82340298	Revert "Revert "enable csgmv automatically on cuda"" (#14277 )	2025-12-03 13:12:30 -08:00
alisonshao	80518bea65	Fix validation to detect missing model files before loading (#14253 )	2025-12-03 11:36:07 -08:00
Lianmin Zheng	46d7b35ec7	Move custom_ops under layers; move _custom_ops.py → custom_all_reduce_ops.py (#14326 )	2025-12-03 10:33:37 -08:00
Sulfur6-L8972	20aad5b5ab	Single Batch Overlap for MoE Models (#9660 ) Co-authored-by: Cheng Wan <wan4ch@gmail.com> Co-authored-by: Zqy11 <841971412@qq.com> Co-authored-by: AniZpZ <aniz1905@gmail.com> Co-authored-by: TianyuZhang1214 <tianyuzhang1214@gmail.com> Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>	2025-12-03 10:07:42 -08:00
Dongjie Zou	aca0d01d3f	[diffusion] doc: add vae path to cli doc#14004 (#14355 ) Co-authored-by: BBuf <1182563586@qq.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2025-12-03 15:13:25 +00:00
Michelle Wu	443d7bcd83	[Ascend] fix AscendAttnMaskBuilder bug to support float16 models (#14271 )	2025-12-03 19:37:16 +08:00
chenxu140	16d8de2284	[bugfix] NpuFuseEPMoE miss initialization parameters (#14295 )	2025-12-03 19:36:41 +08:00
ZhengdQin	d122e32467	[NPU] bug fix: w_vc need contiguous for NPU batch_matmul_transpose ops (#13980 )	2025-12-03 19:35:18 +08:00
Yuhao Yao	77512ae0d7	[bugfix] Fix prefill tbo disabled when --deepep-mode=auto (#14333 ) Co-authored-by: Cheng Wan <wan4ch@gmail.com>	2025-12-03 01:20:33 -08:00
Shangming Cai	93452a8252	[PD] Support decode pp for PD disaggregation (#14265 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-12-03 14:35:29 +08:00
Vikram	42271376d1	[bug fix] use npu phy id in container env (#14266 ) Co-authored-by: jinke15 <jinke15@jd.com>	2025-12-03 11:33:43 +08:00
Johnsonms	043f13171f	[Performance] Optimize NSA Indexer K/S Buffer Access with Fused Triton Kernels (#13812 ) Co-authored-by: Johnsonms <johnson@together.ai>	2025-12-02 18:53:06 -08:00
Dongjie Zou	f764c6910d	[diffusion] feat: support distilled vae generic (#14195 ) Co-authored-by: BBuf <1182563586@qq.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2025-12-03 10:27:31 +08:00
Even Zhou	7d1a130cde	Refactor custom allreduce logics (#13710 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-12-02 17:20:05 -08:00
sglang-bot	7ae368efde	chore: bump SGLang version to 0.5.6 (#14316 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-12-02 17:17:13 -08:00
Lianmin Zheng	ca52ed425f	Clean up imports and move files (#14317 )	2025-12-02 16:31:54 -08:00
Eva20150932-atlascloud	7c38eca1e4	feat: DeepSeek new v3.2 encoding (#14249 ) Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-12-02 11:41:05 -08:00
Quanfeng Li	427b08e24d	Init TBO with dp_padded batch (#11423 ) Co-authored-by: Cheng Wan <wan4ch@gmail.com> Co-authored-by: Yuhao Yao <37280700+yuhyao@users.noreply.github.com>	2025-12-02 10:34:26 -08:00
alisonshao	0141ca370f	Revert PR #14044 : Restore separate memory pool for piecewise CUDA graph (#14278 )	2025-12-02 09:53:16 -08:00
alisonshao	25a6be4930	Fix duplicate download log messages in multi-process environment (#14299 )	2025-12-02 09:33:18 -08:00
Mick	9530b76630	[diffusion] refactor: simplify DmdDenoisingStage (#14269 )	2025-12-02 18:59:40 +08:00
Jinyan Chen	3067b3f050	[diffusion] chore: improve model info registration and searching strategy (#14281 ) Co-authored-by: Jinyan Chen <jinyanc@nvidia.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2025-12-02 18:28:59 +08:00
Lianmin Zheng	64092c8b55	[Auto Sync] Rename is_hybrid to is_hybrid_swa (#14252 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com> Co-authored-by: Hanming Lu <hanming@x.ai>	2025-12-01 23:24:24 -08:00

1 2 3 4 5 ...

4935 Commits