sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-03 22:07:12 +00:00

Author	SHA1	Message	Date
Mick	1a006c2a0d	[diffusion] refactor: split component_loader into component-wise files (#17820 )	2026-01-31 20:22:31 +08:00
Lianmin Zheng	7412ceb4eb	[Auto Sync] Update linear.py to assert shapes (20260130) (#17966 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2026-01-31 01:01:55 -08:00
Lianmin Zheng	0e184609d3	Add launch_command assignment in crash dump (#17967 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Archit Patke <apatke@x.ai>	2026-01-31 01:00:40 -08:00
b8zhong	22498e10c0	[Fix] Triton TP MoE Dpsk V3/Qwen3 Coder with SwapAB (#17965 )	2026-01-31 15:56:26 +08:00
Zheng Wengang	a4df95c15f	[EPD][Perf] parallelize ZMQ send for encode server (#16487 ) Co-authored-by: siyu <liusy58@linux.alibaba.com>	2026-01-31 14:30:11 +08:00
jeff	04efd03dbf	Fix OOM in DeepSeek weight loading by deferring dict(weights) materialization (#17744 )	2026-01-31 13:59:00 +08:00
Hudson Xing	c72bf50706	add reasoning_tokens usage test for tool call (#18022 )	2026-01-30 21:09:23 -08:00
Mohammad Miadh Angkad	d0d9cecd1b	Fix cuBLAS >=12.9 detection for cu12/cu13 package naming (#17766 )	2026-01-31 12:01:52 +08:00
Xiaoyu Zhang	22aad4e2c4	[Diffusion] Fix FLUX.1-schnell time embedding argument mismatch (#17988 )	2026-01-31 11:47:27 +08:00
Bi Xue	5d00150e99	[sglang] fix mm token padded value overlap with text token id (#17781 )	2026-01-30 17:09:13 -08:00
JiaruiChang5268	e86476acfc	[NPU] support llama-3.2-11B-vision-instruct mode for NPU (#17492 ) Co-authored-by: McZyWu <zhuoyun.wu.23@ucl.ac.uk> Co-authored-by: chenyang08056032 <chenyang08056032@163.com> Co-authored-by: Hexq0210 <893781835@qq.com>	2026-01-31 08:49:38 +08:00
Siyuan Chen	578b119bc6	[BugFix] Fix server crashes when req.grammar and ngram spec are enabled (#17585 )	2026-01-30 11:57:42 -08:00
Sam (Kesen Li)	81449b4bee	Optimize GDN decode for Qwen3 Next (#17094 )	2026-01-31 01:02:12 +08:00
Xiaoyu Zhang	abf13ccc11	[Diffusion] Fix lora default lora_scale bug (#17982 )	2026-01-30 22:04:54 +08:00
Zheng Li	0c5a81acb8	[BUGFIX] Fix dp size > 1 for qwen3 vl model (#17624 ) Co-authored-by: yizhang2077 <1109276519@qq.com>	2026-01-30 20:44:25 +08:00
Changhun Lee	c04efe030a	[Model] Add K-EXAONE model support (#16294 ) Signed-off-by: lkm2835 <lkm2835@gmail.com> Co-authored-by: lgai-exaone <exaonemodels@lgresearch.ai> Co-authored-by: lkm2835 <lkm2835@gmail.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2026-01-30 20:01:14 +08:00
Fan Yin	8ce9609fa2	fix: fix SHM pointer re-serialization in DP attention (#17930 )	2026-01-30 17:03:30 +08:00
Ke Bao	77a27e728c	Add cuda graph status to prefill log (#17836 )	2026-01-30 16:56:53 +08:00
Haotong Zhang	c8dc543dc5	SGLang Tracing: Improve root span attributes (#17008 ) Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>	2026-01-30 16:02:05 +08:00
jianzhao-xu	96584ab692	adapt MODELSCOPE download (#17922 )	2026-01-29 23:26:54 -08:00
McZyWu	70db3398d1	[NPU] enhance accuracy for model kimi-vl-a3b-instruct (#17480 ) Co-authored-by: cy <chenyang08056032@163.com>	2026-01-30 15:19:42 +08:00
jianan-gu	c35aa0238c	[CPU][INT4] Add INT4 kernels for CPU (#8226 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-29 22:30:13 -08:00
jianan-gu	336dc4579e	[CPU] Optimize Qwen3-next model on CPU (#12525 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com> Co-authored-by: Fan Yin <1106310035@qq.com>	2026-01-29 22:03:58 -08:00
Polisetty V R K Jyothendra Varma	71e4d3b6bc	[Intel GPU] fix import error to run DeepSeek-V2-Lite model with BF16 on XPU (#10858 )	2026-01-29 21:53:53 -08:00
gaopengff	7541da15d2	Fix prefill latency performance drop of bench serving (#14592 )	2026-01-29 21:28:17 -08:00
Polisetty V R K Jyothendra Varma	858dc80aff	[Intel GPU] fix device in DeepseekScalingRotaryEmbedding to run DeepSeek-V2-Lite BF16 on XPU (#10021 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-01-29 21:21:38 -08:00
LHXuuu	0e4d9ddbd6	Fix the scenario where eh_proj is quantized in the bailing moe nextn weights (#17808 ) Signed-off-by: LHXuuu <xulianhao.xlh@antgroup.com>	2026-01-29 21:11:47 -08:00
Kangyan-Zhou	606ff09ef8	[Fix] Remove unused Type import in gpt_j.py (#17975 ) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-29 21:11:11 -08:00
Koushik Dutta	632c7afa8c	[Fix] add block size logic for sm120 smem size (#14311 )	2026-01-29 21:01:57 -08:00
Wenchen Lo	046b29be16	GPTJForCausalLM Support (#7839 ) Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>	2026-01-29 21:00:04 -08:00
b8zhong	22df62d586	add weightless qk norm to RMSNorm interface for Llama 4 (#12813 ) Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>	2026-01-29 19:09:55 -08:00
baonudesifeizhai	84ab611af8	model: support DeepSeek-OCR-2 (#17897 )	2026-01-30 09:49:51 +08:00
StonyPort	2b3408ff14	feat: add forward timeout (#17831 ) Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>	2026-01-30 08:52:29 +08:00
Cheng Wan	6a6b36367e	Fix logprob_start_len handling for prefill-only requests (#17395 )	2026-01-29 15:14:43 -08:00
Kangyan-Zhou	c3bf53c7c1	Fix ci weight validation logic to check the safetensor completeness (#17917 )	2026-01-29 13:00:42 -08:00
Cheng Wan	a416af4be7	Fix capture_sizes range for pcg (#17956 )	2026-01-29 12:46:35 -08:00
EduardDurech	1b6798a6a4	Fix `torch.__version__` for PEP440 (#15682 )	2026-01-29 11:55:13 -08:00
Hudson Xing	d417c6809e	Add tool call tests for DeepSeek V3.2 in nightly CI (#17951 )	2026-01-29 09:50:54 -08:00
Ratish P	88fb927cc9	[diffusion]: add dummy device attribute to fix AttributeError (#17949 )	2026-01-29 09:35:12 -08:00
Shivam jindal	0769de9b0f	Support LightOnOCR-2-1B (#17806 )	2026-01-29 23:03:41 +08:00
Ziang Li	3c9cc44ff5	Add mxfp8 support for online quantization, Triton dense linear, and CUTLASS MoE (#17449 )	2026-01-29 21:33:57 +08:00
Yuhao Yang	3c2f4c7bbe	[diffusion] model: sync with upstream z-Image (#17822 )	2026-01-29 21:10:11 +08:00
RoyWang	30adf78f82	[diffusion]: align sglang diffusion AMD pyproject_other.toml diffusion dependency with pyproject.toml (#16225 ) Co-authored-by: roywang <roywang@amd.com>	2026-01-29 01:50:57 -08:00
kk	ef1c512754	Add aiter bias moe support in gpt-oss mxfp4 model (#17735 ) Co-authored-by: wunhuang <wunhuang@amd.com>	2026-01-29 01:50:11 -08:00
triple-mu	319f6886fe	[diffusion] model: move tp_rmsnorm check to WanTransformerBlock (#17792 )	2026-01-29 16:39:00 +08:00
Zhang Yiyang (SII)	cdedbf1486	[diffusion] fix: resolve library mismatch in scheduler and update dit offload method name (#17916 )	2026-01-29 15:54:36 +08:00
22dimensions	7b79326751	[NPU] support GPTQ quantization on npu (#15203 ) Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2026-01-29 15:48:18 +08:00
Niko Ma	cbf90d70ff	[PD] Support KV transfer with MORI-IO (#14626 ) Co-authored-by: cwortman-amd <cwortman@amd.com>	2026-01-28 23:22:41 -08:00
R0CKSTAR	d3cdee0a04	[MUSA][4/N] Add common device utilities, distributed backend, and custom op wiring (#17246 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2026-01-28 23:13:24 -08:00
Xinyuan Tong	9409c43593	Fix flaky tool calls in the Kimi K2.5 model (#17914 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2026-01-28 20:58:16 -08:00

... 7 8 9 10 11 ...

6437 Commits