sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-03 22:07:12 +00:00

Author	SHA1	Message	Date
ishandhanani	01e3f4682e	feat(kv-events): Add medium field to KV event types for storage tier tracking (#18205 )	2026-02-09 12:39:15 -08:00
Zheng Li	27c447653d	model: support Qwen3.5 (#18489 ) Co-authored-by: 瑀澈 <yuche.lz@alibaba-inc.com>	2026-02-10 00:27:59 +08:00
Kurt Shuster	006da22268	Pass `quantize_config` to `_initialize_model` (#18273 )	2026-02-09 23:34:42 +08:00
brimon	ddbcfbaaab	feature: support bidirectional attention for Gemma-3 (#10707 )	2026-02-09 23:17:45 +08:00
Mick	4f7da5ad0f	[diffusion] chore: fix unclean shutdown and resource leaks (#18477 )	2026-02-09 22:32:08 +08:00
yrk111222	76eb1c8406	[diffusion] feat: add ModelScope support (#17924 )	2026-02-09 19:23:45 +08:00
Baizhou Zhang	615a02dcd4	Revert "optimize get_topk_ragged by fusing get k and k_scale triton kernel" (#18471 )	2026-02-09 16:37:19 +08:00
Liangsheng Yin	875ad6cf35	Tiny rename for spec related fileds. (#18468 )	2026-02-09 00:10:39 -08:00
LHXuuu	107958a489	Make compressed-tensors MoEs support ignored layers (#17828 ) Signed-off-by: LHXuuu <xulianhao.xlh@antgroup.com> Co-authored-by: Peng Zhang <aniz1905@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-09 14:37:33 +08:00
Junlin Zhou	14652243bd	[DLLM] Add JointThreshold algorithm for joint M2T and T2T decoding (#18171 ) Signed-off-by: Junlin Zhou <zhoujunlin.zjl@antgroup.com> Co-authored-by: Tiwei Bie <tiwei.btw@antgroup.com>	2026-02-09 14:20:45 +08:00
Bingxu Chen	3f3c201243	[AMD] Update aiter to v0.1.10.post2 (#18423 ) Co-authored-by: kkHuang-amd <wunhuang@amd.com> Co-authored-by: YC Tseng <yctseng@amd.com>	2026-02-08 22:08:24 -08:00
Yingchun Lai	a1189068fa	fix: fix the wrong return value type of draft model runner (#18105 ) Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>	2026-02-08 20:51:35 -08:00
Zheng Wengang	68e31a3485	[BugFix][PD]Fix metadata_buffer_index leak when aborted in PD (#17483 )	2026-02-09 11:34:29 +08:00
Shangming Cai	bffd765417	Refactoring Mooncake TE as a shared distributed component (#17810 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2026-02-09 10:53:11 +08:00
Yi Zhong	bf89cc3803	[ModelOPT] Support Qwen 3 Next Coder NVFP4 (#18224 ) Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>	2026-02-08 22:29:07 +00:00
Mohammad Miadh Angkad	071bf2ce09	[Kimi-K2.5] Fix missing `quant_config` in `KimiK25` (#18440 )	2026-02-08 12:02:45 -08:00
Piotr Mazurek	656a3d742e	Add tensor parallelism support to LFM2 ShortConv layers (#17777 )	2026-02-09 00:52:47 +08:00
Mick	6601bc24da	[diffusion] chore: revise process title (#18446 )	2026-02-09 00:14:06 +08:00
debo3	031a652b93	Fix TRT-LLM MLA backend applying k_scale to BF16 KV cache in BMM1 (#18396 )	2026-02-08 23:11:16 +08:00
Mick	a41aff1243	[diffusion] refactor: group component loaders under the component_loaders/ directory (#18438 )	2026-02-08 23:02:27 +08:00
Yi Zhong	ca36d88fa6	[ModelOpt] Fix broken Qwen3-235B-A22B-Instruct-2507-NVFP4 launch (#18189 ) Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>	2026-02-08 14:35:28 +00:00
wxy	43eecd8265	[diffusion] feat: support efficient sequence shard (#18161 )	2026-02-08 21:09:39 +08:00
Zack Yu	d71ccd8860	fix: sync server_args.kv_cache_dtype when detecting FP8 KV cache (#18394 )	2026-02-08 14:10:59 +08:00
DarkSharpness	8e2e835c2f	[Fix] Fix backend selection after flashinfer version update (#18364 )	2026-02-08 11:20:41 +08:00
Makcum888e	00248d85c7	[diffusion] platform: support WAN/FLUX/Qwen-Image/Qwen-Image-edit on Ascend (#13662 ) Co-authored-by: dhx98 <haox.dai@gmail.com> Co-authored-by: DHX98 <haoxiand@andrew.cmu.edu> Co-authored-by: ronnie_zheng <zl19940307@163.com> Co-authored-by: DHX98 <DHX98@noreply.gitcode.com> Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>	2026-02-08 10:45:30 +08:00
Mohammad Miadh Angkad	7b83659310	fix: fix NVFP4 Kimi-K2.5 weight mapping and exclude list (#18370 )	2026-02-08 10:23:48 +08:00
wxy	64950d8f97	[diffusion] feat: support saving videos directly on the server to avoid the overhead of tensor transfer (#18253 )	2026-02-07 22:08:42 +08:00
Mick	31d4cd2ffd	[diffusion] fix: respect dist_timeout option (#18386 )	2026-02-07 20:56:04 +08:00
Mohammad Miadh Angkad	fddef76619	[Doc] Fix outdated `--fp4-gemm-backend` documentation (#18350 )	2026-02-07 20:42:47 +08:00
Hao Jin	d792aa7618	[diffusion] fix: remove unnecessary norm_type argument from GLM-Image dits (#18382 ) Co-authored-by: Hao Jin <Hao Jin>	2026-02-07 20:35:12 +08:00
Baizhou Zhang	eb4cf1dfc4	[CI] Skip some flaky subtests for test_multi_lora_backend.py (#18408 )	2026-02-07 19:06:53 +08:00
Xiaoyu Zhang	baec650462	[Diffusion] Apply fused_norm_scale_shift to LTX2/MOVA (#18257 ) Co-authored-by: yihanc <yingluosanqian@gmail.com>	2026-02-07 17:28:42 +08:00
赵晨阳	1552aab741	Support execute_shell_command for env var support (#18390 )	2026-02-07 12:33:29 +08:00
hlu1	4637970dfb	[Qwen3Next] Optimize fused_sigmoid_gating_delta_rule_update_kernel (#18271 )	2026-02-07 11:59:42 +08:00
Neal Vaidya	f1ff697494	add hybrid model PD to NIXL connector (#16229 ) Signed-off-by: Neal Vaidya <nealv@nvidia.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>	2026-02-06 15:05:36 -08:00
Prozac614	e13b727e92	[diffusion] CI: update perf baseline (#17512 )	2026-02-07 00:28:44 +08:00
shaharmor98	c6aa1863be	Add Nemotron 3 Nano tests (#18119 ) Signed-off-by: Shahar Mor <smor@nvidia.com>	2026-02-06 23:55:42 +08:00
xiaoye	79d409f210	[diffusion] fix: offload text encoder model in image encoding stage (#18317 )	2026-02-06 22:55:56 +08:00
Xuchun Shang	3d68bd9d9b	add hicache jit test (#17847 ) Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>	2026-02-06 16:54:33 +08:00
陈一涵	f798ab9775	[diffusion] fix: fix torch.compile graph break caused by torch._dynamo.disable (#18336 )	2026-02-06 14:48:09 +08:00
Alison Shao	d0c39bc219	Fix cross-container HF download race condition in CI (#18328 )	2026-02-05 21:01:41 -08:00
Linyu Wu	aa390d2762	[Kernel] Migrate GPTQ-Marlin GEMM kernel to JIT (#18067 )	2026-02-06 08:31:42 +08:00
aaaandychen	6a4b81e2d9	Refactor(qwen3-vl) optimize position encoding interpolation (#16781 ) Signed-off-by: chenzhenyang <andy271828@163.com> Signed-off-by: chenzhenyang <chenzhenyang@moonshot.cn> Co-authored-by: chenzhenyang <chenzhenyang@moonshot.cn> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2026-02-05 10:26:35 -08:00
ovidiusm	498d8d0680	NixlKVManager optimizations (#17654 ) Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>	2026-02-06 00:25:23 +08:00
wxy	b639779dd8	[diffusion] feat: allow T5's TP Group to reuse the transformer's SP Group (#17818 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-02-06 00:12:19 +08:00
Glen Liu	3f32a5831d	throw error if got adapter with added_tokens (#18046 )	2026-02-05 23:55:43 +08:00
pansicheng	2eb4359ada	[Kernel] Add JIT apply_rope_with_cos_sin_cache_inplace (#18155 )	2026-02-05 21:49:37 +08:00
陈一涵	4aa03d91fd	[diffusion] fix: fix accuracy bug caused by #14717 (#18296 )	2026-02-05 20:36:18 +08:00
Shangming Cai	afae4c7178	[PD] Minor code cleanup for mooncake backend (#18279 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2026-02-05 17:38:09 +08:00
zhangheng	079fc8f3c5	[piecewise graph]: support MiniMax-M2 (#18217 )	2026-02-04 23:24:38 -08:00

... 4 5 6 7 8 ...

6437 Commits