sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-06-30 19:57:52 +00:00

Author	SHA1	Message	Date
Shu Wang	5638d40f3a	[nvidia] Gemma4 nvfp4 fix (#22079 )	2026-04-10 08:44:34 +08:00
Tarushii Goel	cebd9c2a1e	[sgl] add ability to return logprobs in MultiLayerEagleWorkerV2 (#22241 )	2026-04-09 16:20:55 -07:00
Mohammad Miadh Angkad	c3833ba929	Enable DFLASH support for additional model backends (#22358 ) Co-authored-by: David Wang <21328423+dcw02@users.noreply.github.com>	2026-04-09 14:36:12 -07:00
Ethan (Yusheng) Su	28ef6de091	[Lora] Lora quat info re-factor and support deepseekv3 mla lora (#22323 )	2026-04-09 14:19:58 -07:00
Baizhou Zhang	60acdc31f2	[Fix] Fix several bugs on DSA models (#22430 )	2026-04-09 12:46:23 -07:00
Baizhou Zhang	606aa11ea8	[DSA] Enable all reduce fusion for DSA models (#22390 )	2026-04-09 12:42:44 -07:00
Lawrence Wu	8eb235ab51	fix: do not strip whitespace from GLM tool call values (#20543 ) Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>	2026-04-09 11:14:15 -07:00
Lishan H	8b991d98a1	[feature] asr: add chunk-based streaming ASR for Qwen3-ASR (#22089 )	2026-04-10 01:49:03 +08:00
billishyahao	1df9f4e2f6	[AMD] Add prealloc token env for mori-ep (#22329 )	2026-04-09 09:34:35 -07:00
Jonah Bernard	8216b921a1	Add MLX profiling to bench_one_batch.py (#22159 )	2026-04-09 20:45:21 +08:00
Liangsheng Yin	9fed58805f	[Doc] Clarify SWA `HybridSWAPoolConfigurator` comments on all-SWA vs hybrid semantics (#22443 )	2026-04-09 03:02:16 -07:00
YMbmzy	8a67fb20ea	[Speculative] Support penalty for spec v2 overlap scheduling (#22049 )	2026-04-09 01:59:04 -07:00
Thomas Wang	628df31d08	[AMD] Use aiter CK layernorm2d for LayerNorm to reduce NSA indexer kernel launches (#22424 )	2026-04-09 01:55:29 -07:00
xutizhou	57ffc55fb6	feat: [1/2] [DeepEP] Fuse shared expert into MoE dispatch under EP (#20089 ) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: AichenF <aichenf@nvidia.com>	2026-04-09 01:48:28 -07:00
Mick	9709192ce9	[diffusion] feat: support FLUX.2-small-decoder (#22414 )	2026-04-09 15:53:14 +08:00
Liangsheng Yin	de441ac6bb	[core] Introduce `MemoryPoolConfigurator` class hierarchy (#22389 )	2026-04-09 15:29:19 +08:00
Evgueni Petrov	b9c316917b	fix AttributeError: 'LazyValue' object has no attribute 'keys' in eplb_manager.py for qwen3 moe (#21822 )	2026-04-09 00:13:29 -07:00
Nicolas Castet	e379befbac	Add symmetric debug mode to print stack trace of comm ops with unregistered tensors (#18569 )	2026-04-08 22:34:58 -07:00
Bingxu Chen	6b96f8341d	[AMD] Fix multimodal diffusion test crash on ROCm by falling back to SDPA (#22335 )	2026-04-08 22:32:49 -07:00
Mick	355fcbcc17	[diffusion] fix: fix cache dit refresh none mask (#22374 )	2026-04-09 11:58:24 +08:00
jsheng_Linkedin	6838a23226	[Feature] Add token embedding overrides for sparse embedding replacement (#20960 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 20:51:36 -07:00
Kurkur	a69be2e866	[Feature] Support eagle3 for qwen3-vl (#22230 )	2026-04-09 11:45:36 +08:00
Lianmin Zheng	ddc8ef1038	Lazy import flash_attention_v4 to avoid loading flash_attn.cute at startup (#22306 )	2026-04-08 20:40:25 -07:00
Khoa Pham	f127d67823	[Spec][Ngram] Misc enhance support for multiple SAMs (#22294 )	2026-04-08 19:56:23 -07:00
Kangrui Du	1b7c33a5b7	[diffusion] rl: revamp rollout Log-Prob support with SDE/CPS for RL post-training (#21204 ) Co-authored-by: MikukuOvO <mikukuovo@gmail.com>	2026-04-09 09:00:00 +08:00
Liangsheng Yin	1e3f6ebea6	[core] Extract pool sizing logic to pool_configurator.py (#22384 )	2026-04-08 16:13:21 -07:00
Baizhou Zhang	4e5b8cb041	Fix get_version_tag.py to handle dot-separated post versions (#22385 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:18:22 -07:00
sglang-bot	df3275bd6c	chore: bump flashinfer version to 0.6.7.post3 (#22382 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2026-04-08 14:49:45 -07:00
Yufeng He	c89afaea7c	Fix hybrid_linear_attn_backend crash with ngram speculation (#20739 ) Co-authored-by: kpham-sgl <khoa.pham@radixark.ai> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 12:52:07 -07:00
YAMY	c26b8b4a4b	[GDN] Remove FlashInfer GDN decode + no_buffer guard and default to FlashInfer on SM100+ (#21861 )	2026-04-08 11:59:15 -07:00
Kurt Shuster	db30a63a13	[sgl-kernel] support > 1024 experts in moe_align_block_size kernel (#21610 )	2026-04-08 11:45:13 -07:00
Mick	4ac6fa0d87	[diffusion] fix: fix loading multiple ckpts with different precision for a same module (#22360 )	2026-04-09 02:44:19 +08:00
Yihao Wang	a5ed507a16	[refactor] [asr] add transcription adapter for extensible ASR models support (#22181 )	2026-04-09 01:19:37 +08:00
Yihao Wang	ae8da14ea3	[fix] [whisper] ensure inputs are moved to the correct device before processing. (#22293 )	2026-04-08 23:45:42 +08:00
Xiaoyu Zhang	b5b2dbe05f	[Diffusion] Add diffusion NVFP4 scaled-mm correctness test (#22127 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-04-08 22:07:24 +08:00
zhaozx-cn	33c9cc8994	[NPU] fix qwen3.5 video processor (#22266 )	2026-04-08 21:13:29 +08:00
Fergus	413913763f	fix: wrap _import_static_state in inference_mode to fix resume on Blackwell (#21035 )	2026-04-08 02:03:39 -07:00
Vladislav Nosivskoy	79c82c5c42	[HiCache] Fix write_backup return type when parent not backed up (#22185 ) Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com> Co-authored-by: hzh0425 <hzh0425@apache.org>	2026-04-08 16:42:57 +08:00
Sundara Raman Ramachandran	712c8c5051	[Score API] Add SequenceClassification Model support (#22118 )	2026-04-08 01:30:58 -07:00
HuangJi	c3c13dd5e3	[diffusion] fix: make warmup image initialization rank-safe (#21817 )	2026-04-08 15:51:09 +08:00
Bingxu Chen	de0cfed159	[AMD] Fix DLPack Error in Aiter flydsl GEMM by Detaching MoE Gate Weight (#22262 ) Co-authored-by: bingxche <binxche@amd.com>	2026-04-07 23:42:10 -07:00
Артем Савкин	cd373667cd	[Bugfix] [NPU] Qwen3.5 with quantization fix (#21692 )	2026-04-08 09:15:48 +03:00
Thomas Wang	729b74d8dd	[AMD] Fix GLM-5 fp8 KV quant path dispatch on MI300 (#22314 )	2026-04-07 21:16:02 -07:00
yuefeng Wu	4e4b4ac153	[NPU] enable index Cache for npu (#21502 )	2026-04-08 11:45:17 +08:00
Alex Nails	493ec91cbe	[CI] Fix stage-b-test-1-gpu-large (0) timeout by reordering LoRA tests and using tokenizer from cache (#22292 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 20:00:44 -07:00
Liangsheng Yin	1c5c6dad5e	[tiny] Fix TOCTOU race in pause-aware weight update locking (#22304 ) Co-authored-by: maocheng23 <maocheng@berkeley.edu> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 18:54:28 -07:00
Mick	eca62ab8f4	UX: clean loggings (#22174 )	2026-04-08 09:46:38 +08:00
maocheng23	6c2a759a04	[fix] Fix writer lock deadlock in update_weights_from_ipc during pause_generation (#22290 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 18:32:56 -07:00
Trevor Morris	7546d04c81	[NVIDIA] Enable FP4 flashinfer trtllm routed moe (#21240 )	2026-04-07 16:16:29 -07:00
Liangsheng Yin	0e2a0260a1	Add fast-fail to multimodal-gen CI (#22284 )	2026-04-07 15:56:12 -07:00

... 4 5 6 7 8 ...

7855 Commits