sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-03 13:57:04 +00:00

Author	SHA1	Message	Date
Lianmin Zheng	9815ee934c	[Auto Sync] Update weight_utils.py (20260212) (#18692 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Dan Zheng <dzheng@x.ai>	2026-02-12 16:26:05 -08:00
Hao Jin	b48fe1d95e	[Diffusion] [BUG] Fix missing initialization of GLM-Image text encoder config (#18704 ) Co-authored-by: Hao Jin <Hao Jin>	2026-02-12 16:19:22 -08:00
Shangming Cai	2a8a48c0ca	Reuse initialized transfer engine in mooncake store (#18460 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2026-02-13 01:21:35 +08:00
Yi Zhang	b168723424	[BUGFIX] fix bug in handle mamba radix cache in server_args (#18723 )	2026-02-12 21:33:32 +08:00
Simo Lin	92c5749f41	refactor: replace local proto compilation with smg-grpc-proto package (#18682 )	2026-02-12 05:29:24 -08:00
Scott Lee	c59b9223e6	Add `spec_accept_histogram` request statistic (#18332 )	2026-02-12 21:09:21 +08:00
Hudson Xing	f3656432c7	add tool_choice=auto nightly test case (#18302 )	2026-02-12 19:28:05 +08:00
Thomas Wang	e20e6c28b9	[AMD] Fix accuracy issue when running TP4 dsv3 model with mtp (#18607 ) Co-authored-by: YC Tseng <yctseng@amd.com> Co-authored-by: kkHuang-amd <wunhuang@amd.com>	2026-02-12 01:13:16 -08:00
chenxu214	1edc69be08	[Ascend]Support qwen3.5 (#18544 ) This PR affects only the NPU. If any issues arise, please contact iforgetmyname.	2026-02-12 15:22:47 +08:00
Vinh H. Pham	feaa9e7e00	[diffusion] fix: replace TextEncoderConfig with Qwen3TextConfig for Z-Image (#18560 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-02-12 14:31:23 +08:00
JooYeon	c9297661b9	fix: /metrics endpoint always reports engine_type="unified" in PD disaggregation mode (#18552 ) Co-authored-by: joo_yeon.lee <joo_yeon.lee@samsung.com>	2026-02-12 14:20:43 +08:00
Li Jinliang	d91ce176bf	Update README commands to include model-path option (#18557 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-02-12 14:15:26 +08:00
Zheng Li	4ed2548427	[Qwen3_5] Refactor `Qwen3_5ForCausalLMMTP` class implementation (#18538 )	2026-02-12 13:38:26 +08:00
YAMY	454676811e	[Flashinfer Autotune] Fix FlashInfer FP4 MoE autotuning crash by removing incorrect flatten on hidden_states_scale (#18500 )	2026-02-12 13:31:27 +08:00
YC Tseng	20554a0a4f	[AMD] rocm 7.2 image release, PR test, Nightly Test (#17799 ) Co-authored-by: Alan Kao <akao@amd.com> Co-authored-by: bingxche <Bingxu.Chen@amd.com> Co-authored-by: Michael <13900043+michaelzhang-ai@users.noreply.github.com>	2026-02-11 21:29:25 -08:00
danielafrimi	e422bcaed8	[Mamba] Add float16 support for SSM cache dtype (#18444 )	2026-02-12 11:27:47 +08:00
Zhiyu	7e262b6496	Update modelopt quantization config parsing (#13919 ) Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>	2026-02-12 11:08:29 +08:00
R0CKSTAR	41e1fd0be7	[diffusion] fix: webui cannot correctly display generated video using wan2.2 (#18473 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-02-12 10:35:39 +08:00
Yi Zhong	dc1309fc7e	Avoid kimi linear stream sync (#16186 ) Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>	2026-02-12 09:27:22 +08:00
Jiayi Yan	539bbf485c	[Bugfix] fix config bug caused by PR #18273 (#18535 )	2026-02-12 09:26:46 +08:00
Yuwei An	2bd8363486	[PCG] GPT OSS Triton Kernel Support (#18405 ) Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>	2026-02-12 09:23:55 +08:00
qianyue76	f06ab17a73	[diffusion] docs: consolidate diffusion documentation into docs (#18095 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: JiaxinD <djx2048@gmail.com>	2026-02-11 16:55:07 -08:00
Lianmin Zheng	5875ef0a34	Clean up noisy startup log messages and refactor loader.py (#18531 )	2026-02-11 16:12:57 -08:00
Piotr Mazurek	ded068a76e	Add LMF2 MoE model architecture (#17997 )	2026-02-12 01:03:43 +08:00
Ke Bao	5d185efb78	Fix prefill stats for dllm (#18632 )	2026-02-12 01:00:30 +08:00
Vedant V Jhaveri	98b5013d59	add support to enable lora with embedding models (#17780 ) Co-authored-by: Vedant Jhaveri <vjhaveri@linkedin.com>	2026-02-11 23:19:40 +08:00
Baizhou Zhang	947927bdb5	[V3.2] Change default CP token split method to `--round-robin-split` (#18613 )	2026-02-11 20:14:35 +08:00
McZyWu	4f7422f7ba	[NPU] support model skywork-reward-gemma2-2-27B-v0.2 (#16947 ) Co-authored-by: cy <chenyang08056032@163.com>	2026-02-11 15:34:53 +08:00
sky	72c1526657	Register cp-atten-allgather buffers with symm memory (#17756 ) Signed-off-by: wangfakang <fakangwang@gmail.com>	2026-02-11 15:26:37 +08:00
Thomas Wang	a8eef53dc4	Fp8 prefill attn kernel integration (#18528 ) Co-authored-by: kkHuang-amd <wunhuang@amd.com>	2026-02-10 23:23:48 -08:00
BourneSun0527	2cc235e795	Fix Bug on dsv3.2 (#18553 ) This PR affects only the NPU. If any issues arise, please contact iforgetmyname.	2026-02-11 14:39:01 +08:00
Michael	d84d2063d3	[AMD] Fix Janus-Pro crash and add Kimi-K2.5 nightly test (#18269 )	2026-02-10 22:33:13 -08:00
Liangsheng Yin	cd90346a2b	Add cache hit rate UT (#18566 )	2026-02-10 21:27:41 -08:00
cutetocute	8d2892330c	chore: fix some typos (#18577 ) Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>	2026-02-10 20:47:41 -08:00
Liangsheng Yin	93fca0bbc3	Fix wrong prefill log. (#18570 )	2026-02-10 15:54:03 -08:00
Yi-Chia Chen	2bfab1bb67	Fix radix cache key to include generated tokens in multi-turn (regression) (#16521 ) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-02-10 14:08:34 -08:00
Thomas Wang	4262f5259b	Tilelang sparse decode fwd for dsv32 mi355 (#18488 ) Co-authored-by: kk <43161300+kkHuang-amd@users.noreply.github.com>	2026-02-10 09:53:26 -08:00
Zheng Li	44603764d6	fix(config): Support setting Mamba state dtype via config file (#18532 ) Co-authored-by: 瑀澈 <yuche.lz@alibaba-inc.com>	2026-02-11 00:20:06 +08:00
Mick	efcdda0176	[diffusion] fix: fix fsdp (#18187 )	2026-02-10 20:22:20 +08:00
wxy	47978ee858	[diffusion] feat: support parallel wan-vae decode (#18179 )	2026-02-10 18:32:00 +08:00
Zehuan Li	26f2b3798d	[DLLM] Basic dLLM scheduling strategy and implementation (#17484 ) Signed-off-by: Zehuan Li <lizehuan.lzh@antgroup.com>	2026-02-10 16:54:15 +08:00
shuwenn	8da14aea88	[HiCache] fix: StorageMetricsCollector was initialized twice (#18354 )	2026-02-10 00:53:00 -08:00
Xinyuan Tong	398b81f78c	Support GlmMoeDsaForCausalLM (#18521 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Signed-off-by: BBuf <1182563586@qq.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: BBuf <1182563586@qq.com>	2026-02-10 15:20:10 +08:00
Xinyuan Tong	e8a2c13380	Deepseekv32 compatibility with transformers v5 (#18297 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-02-10 14:50:40 +08:00
siyu	0b15f19927	[EPD] Add notification mechanism to fix server hang and add timeout env var (#18229 )	2026-02-10 11:52:54 +08:00
maocheng23	1d366f1206	Make bench_one_batch_server compatible for more backends (#18512 )	2026-02-10 10:36:39 +08:00
Qiaolin Yu	4a1b50bb2d	Fix idle batch predict dtype in spec v2 (#18379 )	2026-02-10 10:29:13 +08:00
Kartik Ramesh	26a006e47f	Add cache_config_info metric. (#17273 )	2026-02-09 16:09:09 -08:00
Lianmin Zheng	b027c5aca6	[Auto Sync] Update cache_init_params.py (20260209) (#18502 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-09 14:49:41 -08:00
Lianmin Zheng	ce95f203b0	[Auto Sync] Update logits_processor.py (20260209) (#18503 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2026-02-09 14:33:02 -08:00

... 3 4 5 6 7 ...

6437 Commits