sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-04 22:37:18 +00:00

Author	SHA1	Message	Date
Xinyuan Tong	3c34d2c3eb	[FIX] kimi_k2 reasoning parser (#17901 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2026-01-28 19:47:09 -08:00
Joe Redmond	0ff0d181ca	feat: add custom request header logging (#17786 )	2026-01-28 19:33:08 -08:00
kk	f1384f5293	Integration mori backend for EP a2a data communication (#17012 ) Co-authored-by: Duyi-Wang <duyi.wang@amd.com> Co-authored-by: billishyahao <bill.he@amd.com> Co-authored-by: HaiShaw <hixiao@gmail.com>	2026-01-28 19:07:34 -08:00
Jerry Ji	673dc09d9b	[Fix][trtllm-mha] Canonicalize the strides when num_head = 1 (#17732 )	2026-01-29 10:11:18 +08:00
Qi Yuhang	0368ddf9ea	[JIT Kernel]Support fused_add_rmsnorm in JIT Kernel (#17677 )	2026-01-29 09:29:59 +08:00
Zhang Yiyang (SII)	09a9147f59	[diffusion] model: support MOVA (#17704 ) Co-authored-by: gaoyang07 <Gary1546308416AL@gmail.com> Co-authored-by: cms42 <c@cms42.top> Co-authored-by: cms42 <44895820+cms42@users.noreply.github.com> Co-authored-by: Ruixiao Li <cgruixiao@outlook.com> Co-authored-by: Li Ruixiao(SII) <80368770+Li-dongyang@users.noreply.github.com>	2026-01-29 09:12:08 +08:00
Prozac614	3fcda00e8c	[CI] Fix CI timeouts by upgrading runai_model_streamer (related to #16937 ) (#17636 )	2026-01-28 17:09:45 -08:00
Lianmin Zheng	d4180815a4	Make the functions in logits_processor.py and sampler.py more modular (#17885 )	2026-01-28 16:24:23 -08:00
jackey hua	0998de088b	[Perf] Tune Llama-4-Scout-17B-16E-Instruct fused moe kernel (#17891 )	2026-01-28 14:06:46 -08:00
gingerXue	e9d727cb92	[MUSA][7/N] Enhance CUDA / PyNccl wrapper to support MTLink connectivity detection (#17499 ) Signed-off-by: jingzhi.xue <jingzhi.xue@mthreads.com> Co-authored-by: jingzhi.xue <jingzhi.xue@mthreads.com>	2026-01-28 11:36:30 -08:00
Артем Савкин	b77b0ffd60	[NPU] NZ for non-quantized MOE, Qwen3 MOE double memory consumption fix (#15904 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-29 00:55:08 +08:00
Jinn	1953efb60e	[AMD] ROCm: route W4A16 MoE to Triton and fix packed-weight loading (#17863 )	2026-01-28 08:20:23 -08:00
triple-mu	1d1e72e516	[diffusion] fix: fix comfyui import typo (#17834 )	2026-01-28 23:49:55 +08:00
Xiaoyu Zhang	c08b54a575	[JIT kernel] Update jit_kernel cache and develop doc (#17842 )	2026-01-28 15:09:47 +08:00
Mick	2573a262af	[diffusion] doc: fix wrong docker run command (#17856 )	2026-01-28 14:52:33 +08:00
Ziang Li	a8dda2aa57	[DSv32] Overlap indexer qk projection and activation quant (#17688 )	2026-01-28 11:46:49 +08:00
Yisheng Gong	1c4616a034	fix: add bias when enable mm fallback variant (#17690 )	2026-01-28 09:50:49 +08:00
陈一涵	647428d8d6	[diffusion] perf: apply mul add fusion for Qwen-Image (#16299 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-28 09:40:13 +08:00
Yashika Gandhi - Google	32ea7bcdd8	[diffusion] endpoint: fix vertex generate (#17611 )	2026-01-28 09:38:56 +08:00
Mick	88fcd8535f	[diffusion] feat: add an arg for controlling the number of prefetched layers in layerwise-offload (#17693 )	2026-01-28 09:34:27 +08:00
Mick	1507dc6cdf	[diffusion] fix: fix suppressing error log on non-main ranks (#17712 )	2026-01-28 09:29:19 +08:00
Xiaoyu Zhang	331a22427c	[Diffusion] glm-image apply flashinfer rope (#17689 )	2026-01-28 08:51:37 +08:00
siyu	4d00bd17a3	use shared memory for multimodal feature transport between Tokenizer and Scheduler (#16402 ) Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>	2026-01-27 11:01:08 -08:00
Minglei Zhu	d90c0837e5	[hybrid-model] clean up and consolidate redundant fields in RadixLinearAttention (#17660 )	2026-01-27 10:37:58 -08:00
fsygd	547e2d037e	[diffusion] refactor: add arg to control the precision of dit (#17751 )	2026-01-27 23:01:23 +08:00
monkeyLoveding	d578b41bad	[NPU] Adapt cann 8.5: use sfa and lightning indexer op from cann and CI update (#17615 ) Co-authored-by: Kelon <kelonlu@163.com>	2026-01-27 19:03:53 +08:00
MikkoParkkola	c56d19b977	fix(quantization): add sgl_kernel fallback for FP4 quantize on Blackwell GPUs (#17816 )	2026-01-27 18:43:17 +08:00
Xuchun Shang	dba264ac73	[PP] fix wrong weight logic for tie_word_embeddings model (#15890 ) Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>	2026-01-27 17:41:17 +08:00
Yuxuan Zhang	7106f6c8e1	[GLM-OCR] Support GLM-OCR Model (#17582 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2026-01-26 22:24:00 -08:00
Taemin Jung	81c0f5c5ad	[Model] Add support for EXAONE-4.0 Model (#8205 ) Signed-off-by: BoxBy <lute7071@gmail.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2026-01-27 14:08:24 +08:00
laixin	6c9b054ab7	[Bug Fix] Fix reasoning parser when continue_final_message=true (#17065 ) Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2026-01-27 14:04:44 +08:00
shuwenn	57e432d951	fix: preserve disconnect events in api key middleware (#17253 )	2026-01-26 22:48:24 -05:00
shuwenn	fd3b179ffd	[HiCache][HA 1/N] Support HiCache storage runtime attach/detach (#15892 )	2026-01-26 19:33:19 -08:00
Zhongdongming Dai	1b56a886bb	[chore]: improve time tracing of model loading process (#15426 ) Co-authored-by: Michael Shin <mmshin@nvidia.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>	2026-01-26 19:04:25 -08:00
Yuhao Yang	479ab7a4e7	model: support Kimi-K2.5 (#17789 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-01-27 10:57:00 +08:00
WenhaoZhang	0519b0935f	[diffusion] comfyui: support Qwen-Image, Multi-GPU Z-Image, and Enhanced ComfyUI Integration (#17678 ) Co-authored-by: niehen6174 <niehen.6174@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-27 10:06:42 +08:00
FlyPanda	2d8c22a15e	[bugfix] Internal processing of hf3fs crash # 16614 (#16938 )	2026-01-26 18:01:50 -08:00
Mahdi-CV	539924037f	fix(processor): support InternS1 text_config in InternVL processor (#17040 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-26 13:02:54 -08:00
ybyang	5ab76ff220	Special logic for healthcheck (#17734 ) Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>	2026-01-26 10:26:40 -08:00
Liangsheng Yin	85d077f44d	Introduce global `alloc_len_per_decode` & clean check decode memory (#15115 )	2026-01-26 10:26:20 -08:00
Makcum888e	bba6e38ff8	[NPU] Split pyproject npu from pyproject other (#17641 )	2026-01-26 09:45:44 -08:00
Yuan Luo	7bb41989fa	[1/N] Optimize All Reduce - Benchmark different AR operations (#13797 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-01-26 22:44:13 +08:00
lawtherWu	b56366f827	[NPU]DeepSeek-V3.2 support npu mlaprolog (#15381 ) Co-authored-by: Zhengda Qin <zhengdqin@gmail.com> Co-authored-by: richhuan <huan_rz@qq.com>	2026-01-26 20:42:37 +08:00
Yi Zhang	5844cb2fd8	refactor mamba radix cache logic in server_args (#17645 )	2026-01-26 17:02:49 +08:00
shaharmor98	f6f1b6d000	Bump FI version (#17700 ) Signed-off-by: Shahar Mor <smor@nvidia.com> Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>	2026-01-26 16:50:06 +08:00
McZyWu	2734b23481	accuracy enhancement for baichuan2-13B for npu (#16868 ) Co-authored-by: cy <chenyang08056032@163.com>	2026-01-26 16:14:35 +08:00
Prozac614	12f794e516	[diffusion] fix: fix missing backend argument in pipelines_core initialization (#17343 )	2026-01-26 15:47:10 +08:00
Kangyan-Zhou	48f4340b14	Exclude some diffusion package for ARM in docker release (#17745 )	2026-01-25 23:32:39 -08:00
Alison Shao	30b3192039	Merge performance/accuracy test suites into regular stage-b suites (#17609 )	2026-01-25 22:49:19 -08:00
CSWYF3634076	1a19b3987d	[Model] Add Ernie4.5 VL model support (#15679 ) Signed-off-by: CSWYF3634076 <wangyafeng@baidu.com> Signed-off-by: wangyafeng <wangyafeng@baidu.com>	2026-01-25 22:36:29 -08:00

... 8 9 10 11 12 ...

6437 Commits