sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 12:17:09 +00:00

Author	SHA1	Message	Date
Mick	355fcbcc17	[diffusion] fix: fix cache dit refresh none mask (#22374 )	2026-04-09 11:58:24 +08:00
jsheng_Linkedin	6838a23226	[Feature] Add token embedding overrides for sparse embedding replacement (#20960 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 20:51:36 -07:00
Kurkur	a69be2e866	[Feature] Support eagle3 for qwen3-vl (#22230 )	2026-04-09 11:45:36 +08:00
Lianmin Zheng	ddc8ef1038	Lazy import flash_attention_v4 to avoid loading flash_attn.cute at startup (#22306 )	2026-04-08 20:40:25 -07:00
Khoa Pham	f127d67823	[Spec][Ngram] Misc enhance support for multiple SAMs (#22294 )	2026-04-08 19:56:23 -07:00
Kangrui Du	1b7c33a5b7	[diffusion] rl: revamp rollout Log-Prob support with SDE/CPS for RL post-training (#21204 ) Co-authored-by: MikukuOvO <mikukuovo@gmail.com>	2026-04-09 09:00:00 +08:00
Liangsheng Yin	1e3f6ebea6	[core] Extract pool sizing logic to pool_configurator.py (#22384 )	2026-04-08 16:13:21 -07:00
Baizhou Zhang	4e5b8cb041	Fix get_version_tag.py to handle dot-separated post versions (#22385 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:18:22 -07:00
sglang-bot	df3275bd6c	chore: bump flashinfer version to 0.6.7.post3 (#22382 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2026-04-08 14:49:45 -07:00
Yufeng He	c89afaea7c	Fix hybrid_linear_attn_backend crash with ngram speculation (#20739 ) Co-authored-by: kpham-sgl <khoa.pham@radixark.ai> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 12:52:07 -07:00
YAMY	c26b8b4a4b	[GDN] Remove FlashInfer GDN decode + no_buffer guard and default to FlashInfer on SM100+ (#21861 )	2026-04-08 11:59:15 -07:00
Kurt Shuster	db30a63a13	[sgl-kernel] support > 1024 experts in moe_align_block_size kernel (#21610 )	2026-04-08 11:45:13 -07:00
Mick	4ac6fa0d87	[diffusion] fix: fix loading multiple ckpts with different precision for a same module (#22360 )	2026-04-09 02:44:19 +08:00
Yihao Wang	a5ed507a16	[refactor] [asr] add transcription adapter for extensible ASR models support (#22181 )	2026-04-09 01:19:37 +08:00
Yihao Wang	ae8da14ea3	[fix] [whisper] ensure inputs are moved to the correct device before processing. (#22293 )	2026-04-08 23:45:42 +08:00
Xiaoyu Zhang	b5b2dbe05f	[Diffusion] Add diffusion NVFP4 scaled-mm correctness test (#22127 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-04-08 22:07:24 +08:00
zhaozx-cn	33c9cc8994	[NPU] fix qwen3.5 video processor (#22266 )	2026-04-08 21:13:29 +08:00
Fergus	413913763f	fix: wrap _import_static_state in inference_mode to fix resume on Blackwell (#21035 )	2026-04-08 02:03:39 -07:00
Vladislav Nosivskoy	79c82c5c42	[HiCache] Fix write_backup return type when parent not backed up (#22185 ) Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com> Co-authored-by: hzh0425 <hzh0425@apache.org>	2026-04-08 16:42:57 +08:00
Sundara Raman Ramachandran	712c8c5051	[Score API] Add SequenceClassification Model support (#22118 )	2026-04-08 01:30:58 -07:00
HuangJi	c3c13dd5e3	[diffusion] fix: make warmup image initialization rank-safe (#21817 )	2026-04-08 15:51:09 +08:00
Bingxu Chen	de0cfed159	[AMD] Fix DLPack Error in Aiter flydsl GEMM by Detaching MoE Gate Weight (#22262 ) Co-authored-by: bingxche <binxche@amd.com>	2026-04-07 23:42:10 -07:00
Артем Савкин	cd373667cd	[Bugfix] [NPU] Qwen3.5 with quantization fix (#21692 )	2026-04-08 09:15:48 +03:00
Thomas Wang	729b74d8dd	[AMD] Fix GLM-5 fp8 KV quant path dispatch on MI300 (#22314 )	2026-04-07 21:16:02 -07:00
yuefeng Wu	4e4b4ac153	[NPU] enable index Cache for npu (#21502 )	2026-04-08 11:45:17 +08:00
Alex Nails	493ec91cbe	[CI] Fix stage-b-test-1-gpu-large (0) timeout by reordering LoRA tests and using tokenizer from cache (#22292 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 20:00:44 -07:00
Liangsheng Yin	1c5c6dad5e	[tiny] Fix TOCTOU race in pause-aware weight update locking (#22304 ) Co-authored-by: maocheng23 <maocheng@berkeley.edu> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 18:54:28 -07:00
Mick	eca62ab8f4	UX: clean loggings (#22174 )	2026-04-08 09:46:38 +08:00
maocheng23	6c2a759a04	[fix] Fix writer lock deadlock in update_weights_from_ipc during pause_generation (#22290 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 18:32:56 -07:00
Trevor Morris	7546d04c81	[NVIDIA] Enable FP4 flashinfer trtllm routed moe (#21240 )	2026-04-07 16:16:29 -07:00
Liangsheng Yin	0e2a0260a1	Add fast-fail to multimodal-gen CI (#22284 )	2026-04-07 15:56:12 -07:00
Thomas Wang	671fe73961	Reduce unnecessary kernels and copies in the NSA indexer (#22232 )	2026-04-07 15:37:08 -07:00
David Wang	f08726fd56	[Feature] Add DFLASH speculative decoding support (#22077 ) Co-authored-by: Jian Chen <141193260+jianc99@users.noreply.github.com> Co-authored-by: Zhijian Liu <5782437+zhijian-liu@users.noreply.github.com> Co-authored-by: Richard Gong <8001209+gongy@users.noreply.github.com> Co-authored-by: David Wang <21328423+dcw02@users.noreply.github.com> Co-authored-by: yilian49 <43861414+yilian49@users.noreply.github.com> Co-authored-by: xm:D <38322020+xiaomin-d@users.noreply.github.com>	2026-04-07 14:48:51 -07:00
Liangsheng Yin	cc35714b03	[tiny] migrate /get_server_info; print accept length in accuracy tests (#22282 )	2026-04-07 13:08:35 -07:00
Rain Jiang	1a8eb890f6	Kernels community fa3 (#20796 )	2026-04-07 12:48:44 -07:00
huangtingwei	0c204fbd57	[HiSparse] Optimize the scheduling of decode backup. (#21932 ) Co-authored-by: hzh0425 <hzh0425@apache.org> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2026-04-07 10:34:58 -07:00
khalilzhk	6131fb5882	[NPU] enable mla prepare fused kernel only when being mla attn (#22024 )	2026-04-08 00:49:16 +08:00
Ke Bao	be42fbbbd7	Support HTTP2 server (#21700 )	2026-04-08 00:42:52 +08:00
shuwenn	ec5742f4ab	fix: Auto-correct page_size for Mamba no_buffer radix cache mode (#20538 )	2026-04-08 00:19:31 +08:00
Henson-Zh-Ali	727a182067	[Mamba] eliminate D2H if tracking mamba states (#20522 ) Co-authored-by: hzh0425 <hzh0425@apache.org>	2026-04-08 00:17:26 +08:00
YAMY	5ae00ecd48	[Disagg][NIXL] Support Mamba state slice transfer for heterogeneous TP (Step 2/2 for Qwen3.5) (#22240 )	2026-04-07 23:47:31 +08:00
Mick	e7bc23cdab	[diffusion] CI: fix consistency check (#22251 )	2026-04-07 23:43:18 +08:00
Yujun Dong	233f3e31bf	fix(pcg,mm): fix zeroing of input_embeds when replay PCG (#22229 )	2026-04-07 20:33:17 +08:00
Xingyu Liu	98f38b14df	Add registration API for external linear attention backend (#21983 ) Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>	2026-04-07 02:47:40 -07:00
Nicolas Castet	490fa9fa44	[Perf] Restore torch.compile fusion for topk postprocessing (#21771 )	2026-04-07 01:38:38 -07:00
YAMY	3148742ddb	[Disagg][NIXL] Fix heterogeneous TP KV transfer for non-MLA models (same logic with mooncake, Step 1/2 for Qwen3.5 support) (#22145 )	2026-04-07 14:52:02 +08:00
amote-i	3f7dfba419	fix qwen2_5_math_rm_72b (#21295 )	2026-04-07 14:36:57 +08:00
Aditya Sharma	f6e85676b5	model: support qwen3-asr (#22073 ) Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2026-04-07 13:27:05 +08:00
Chang Min Bark	a757c1e3fb	[Apple Silicon] [MLX] Add mlx and mlx-lm dependencies (#22162 ) Co-authored-by: R0CKSTAR <yeahdongcn@gmail.com>	2026-04-07 11:36:43 +08:00
Xinyuan Tong	2813cb6d9a	[New Model] Gemma 4 (#21952 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Pengyu Chen <pychen96@gmail.com> Co-authored-by: kpham-sgl <khoa.pham@radixark.ai> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Andy Luo <andy.luo@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: adarshxs <adarsh.shirawalmath@gmail.com>	2026-04-06 20:24:44 -07:00

1 2 3 4 5 ...

7586 Commits