sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-03 13:57:04 +00:00

Author	SHA1	Message	Date
fzyzcjy	261bca3c58	Support captured dump output and console output control in dumper (#19017 )	2026-02-20 12:26:24 +08:00
fzyzcjy	fc1500adc6	Hint users when wrongly execute it with partial ranks in dumper (#19014 )	2026-02-20 12:25:54 +08:00
fzyzcjy	b41d412c3d	Support cleanup previous dumps in dumper (#19013 )	2026-02-20 12:25:21 +08:00
Cheng Wan	13a4a0406e	Fix flashinfer autotune to only wrap run_once() (#19004 )	2026-02-19 20:02:21 -08:00
Cheng Wan	64bca5315f	Fix long prompt KV allocation by falling back to torch native APIs when exceeding Triton tensor limit (#18250 )	2026-02-19 19:15:05 -08:00
Nicolas Castet	99df920cdb	Register tensors with symmetric memory for qwen (#18643 )	2026-02-20 09:32:32 +08:00
Cheng Wan	73a7f0d049	Revert "Add SDAR model support" (#19032 )	2026-02-19 16:03:56 -08:00
Liangsheng Yin	db34c1cbfb	Tiny remove duplicate coredump env injection (#19023 )	2026-02-19 13:26:30 -08:00
Liangsheng Yin	5ff5aa6923	[spec v2]Fix torch gc of future indices (#18958 )	2026-02-19 11:38:25 -08:00
chengshuang18	44ab752b7a	Add SDAR model support (#18318 ) Co-authored-by: root <root@gpu-lg-cmc-h-h200-3047.host.h.pjlab.org.cn> Co-authored-by: chengshuang <chengshuang@pjlab.org.cn> Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>	2026-02-19 11:20:32 -08:00
Mick	3207427d6d	[diffusion] CI: enable warmup as default (#19010 )	2026-02-19 23:27:23 +08:00
Mick	d73f06f091	[diffusion] chore: improve memory usage on consumer-level GPU (#18997 )	2026-02-19 21:59:49 +08:00
satyamk7054	963def7f26	Move lora request validation to tokenizer_manager from server (#18962 ) Co-authored-by: Satyam Kumar <satyamk@linkedin.com>	2026-02-19 21:03:19 +08:00
Makcum888e	d07e8aa4a3	[Diffusion] [NPU] Enable profiler on NPU (#17807 )	2026-02-19 15:33:51 +03:00
Prozac614	e21fc78dbd	[diffusion] fix: fix rank used in parallel executor when enable_cfg_parallel is false (#18975 ) Co-authored-by: daiweitao <dwti614707404@163.com>	2026-02-19 20:12:24 +08:00
Xiaoyu Zhang	19aa19b111	[diffusion] refactor: refactor diffusion triton kernels (#18966 )	2026-02-19 17:03:44 +08:00
pansicheng	48642d5384	[RadixTree][4/N Refactor]: Move available_and_evictable_str to individual radix cache classes (#17852 )	2026-02-19 17:03:15 +08:00
shaharmor98	82a0bafc1c	Feat/add fi selective state update kernel call (#18070 ) Signed-off-by: Shahar Mor <smor@nvidia.com>	2026-02-19 16:56:06 +08:00
Yuwei An	0be30d4b0d	Fix PCG MoE Error (#17739 )	2026-02-19 16:48:06 +08:00
hlu1	bba2fc49a1	[Qwen3.5] Enable nvfp4 checkpoint (#18937 )	2026-02-19 12:24:05 +08:00
hxie	443b1a88d1	Add batched zero copy to NIXL backend (#18850 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2026-02-18 16:31:02 -08:00
Bingxu Chen	462267982b	[AMD] Fix mi35x dsv32 mtp nightly (#18978 ) Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>	2026-02-18 16:23:17 -08:00
Alison Shao	e2fccb2ee0	Fix flaky Qwen3-Next KL divergence tests by reverting mamba slot release (#18910 )	2026-02-19 07:55:16 +08:00
Mengyang Liu	4f980f6f23	[Feature] Implement update_weights_from_disk for SGLang-D (Diffusion … (#18306 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2026-02-18 11:24:07 -08:00
Tamir Baydasov	150ed881be	[4/N] Quantization Refactor: Quark MoE schemes (#18252 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Peng Zhang <aniz1905@gmail.com> Co-authored-by: ronnie_zheng <zl19940307@163.com>	2026-02-18 19:44:30 +03:00
Ethan (Yusheng) Su	9c5aae4df5	[Fix] Add lora tied lm head support (for Qwen2.5, Gemma, etc model need) (#18634 )	2026-02-19 00:34:51 +08:00
Yuhao Yang	5a7ae059e3	Add DP ViT support for Kimi K2.5 (#18689 )	2026-02-18 23:03:07 +08:00
zijiexia	eb6ff4a940	[diffusion] fix: refactor task resolution logic in benchmark function for multimodal generation (#18948 )	2026-02-18 20:57:42 +08:00
Xiaoyu Zhang	390c154306	[Tiny fix] Super tiny fix mul_add naive forward bug (#18964 )	2026-02-18 16:18:43 +08:00
Xiaoyu Zhang	513c12d23f	Remove unused fast-hadamard-transform PyTorch extension sources (#18927 )	2026-02-18 15:51:07 +08:00
Mick	420a611275	[diffusion] refactor: unify SamplingParams construction and improve DiffGenerator return types (#18928 )	2026-02-18 14:56:58 +08:00
DarkSharpness	9d138685c1	[Refactor] Fix test and clean up hicache code (#18555 )	2026-02-18 14:37:46 +08:00
William Arnold	95c44cea29	[feat] Add return_routed_experts param to async_generate for parity with generate (#18508 )	2026-02-17 22:11:19 -08:00
Neal Vaidya	ac0e493329	feat: add nsa and swa disagg support with nixl (#18939 ) Signed-off-by: Neal Vaidya <nealv@nvidia.com>	2026-02-18 13:13:26 +08:00
Liangsheng Yin	2d85f01d43	Revert "Fix generated-shared-prefix bench_serving" (#18956 )	2026-02-17 20:43:55 -08:00
Zheng Li	fa5698d791	feat: [Qwen3.5] Support block-wise FP8 quantization and model adaptation (#18926 )	2026-02-18 11:44:25 +08:00
Yan Ru Pei	83e24e2eb4	Expose priority parameter in Engine.generate() and Engine.async_generate() (#18944 ) Signed-off-by: PeaBrane <peabrane@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-17 19:14:22 -08:00
Alison Shao	34d975b18f	Fix eval tests not capturing server launch failures (#18886 )	2026-02-18 07:59:03 +08:00
Lianmin Zheng	e02a9bec8d	Refactor sampler: Use a better hash function for deterministic sampling and clear dispatch for probs/logprobs/logits sampling paths (#18915 ) Co-authored-by: Sehoon Kim <sehoon@x.ai>	2026-02-17 15:41:23 -08:00
Liangsheng Yin	83a475e8d7	feat: add cuda core dump CI warpper (#18909 )	2026-02-17 14:49:26 -08:00
Ratish P	9a7d6be567	cleanup prefill metrics logging to fix dp-attn metrics (#18778 ) Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>	2026-02-17 14:35:35 -08:00
Qiaolin Yu	3c601db031	Fix generated-shared-prefix bench_serving (#18769 )	2026-02-17 14:00:22 -08:00
Nickcp39	48fcd62d1f	fix(glm-image): single-GPU T5 config + SP support for 4D latents (#18… (#18739 ) Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>	2026-02-17 13:04:32 -08:00
Tamir Baydasov	aeca7d348c	[3/N] Quantization Refactor: ModelSlim MoE schemes (#17993 ) Co-authored-by: ronnie_zheng <zl19940307@163.com>	2026-02-17 21:38:27 +03:00
Simo Lin	bf08d3f43c	[gRPC] Fix scheduler startup broken by context parallel refactor (#18933 ) Co-authored-by: Chang Su <chang.s.su@oracle.com>	2026-02-17 08:52:11 -08:00
triple-mu	504b2c58cf	[diffusion] improve: improve torch.compile for MOVA (#18914 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-18 00:47:38 +08:00
Minglei Zhu	bf52388354	[PCG] support piecewise cuda graph for kimi-linear model (#18849 )	2026-02-17 23:31:12 +08:00
Mick	bfe34c90ff	Revert "[diffusion] operator: unify rotary embedding impl" (#18929 )	2026-02-17 22:56:04 +08:00
Makcum888e	2aa0db7d9c	[Diffusion] [NPU] Fix CI run (#18921 )	2026-02-17 16:54:19 +03:00
HAI	8bb1037796	ROCm use rotary_embedding from sgl-kernel (#18920 )	2026-02-17 03:00:37 -08:00

1 2 3 4 5 ...

6437 Commits