sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-03 22:07:12 +00:00

Author	SHA1	Message	Date
Cheng Wan	38ee749dd9	Fix adjust_num_token_non_padded_for_attn_tp returning CPU tensor (#19051 )	2026-02-20 23:23:38 +08:00
Nicolas Castet	3358ba8945	[Fix] Run FlashInfer autotune on non-default stream for NCCL 2.29+ compatibility (#18987 )	2026-02-20 23:21:38 +08:00
DarkSharpness	52852404c8	[Fix] DO NOT skip save_kv_cache for dllm (#19020 )	2026-02-20 23:20:29 +08:00
Mohammad Miadh Angkad	f23a23cc05	Fix NSA FP8 KV cache path for both-trtllm MHA one-shot (#18931 ) Co-authored-by: rainj-me <96632942+rainj-me@users.noreply.github.com>	2026-02-20 22:00:09 +08:00
Mick	8d789b5c3d	[diffusion] feat: support nunchaku for Z-Image-Turbo and flux.1 (int4) (#18959 )	2026-02-20 21:16:08 +08:00
Yuan Luo	7d953440ec	[jit kernel] Support per_token_group_quant_8bit jit kernel (#18905 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-02-20 21:01:05 +08:00
Mick	38a69652e6	[diffusion] logging: log available mem when each stage starts in debug level (#18998 )	2026-02-20 19:57:06 +08:00
fzyzcjy	0d20cf5a66	Fix lint on main (#19054 )	2026-02-20 15:45:24 +08:00
Cheng Wan	b59a22f781	fix lint on main (#19052 )	2026-02-20 15:30:57 +08:00
Duyi-Wang	b0786cdf94	[AMD] Replace msgpack with msgspec in MORI-IO (#19007 )	2026-02-19 23:04:15 -08:00
YAMY	8541b1118d	[Fix][Qwen3.5] Pass max_mamba_cache_size to mamba pool in disaggregation decode path (#19002 )	2026-02-20 14:31:26 +08:00
chengshuang18	295bc17576	Feature/sdar support (#19044 ) Co-authored-by: root <root@gpu-lg-cmc-h-h200-3047.host.h.pjlab.org.cn> Co-authored-by: chengshuang <chengshuang@pjlab.org.cn> Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>	2026-02-19 21:58:15 -08:00
fzyzcjy	046ef0aa35	Support using SGLang port in dumper (#19038 )	2026-02-20 12:30:24 +08:00
fzyzcjy	2fecc2c075	Support resetting and enhance HTTP endpoints for dumper (#19046 ) Co-authored-by: Yueming Yuan <112649537+yueming-yuan@users.noreply.github.com>	2026-02-20 12:29:09 +08:00
fzyzcjy	503bf3047a	Enhance configure and env parsing in dumper (#19034 )	2026-02-20 12:28:10 +08:00
fzyzcjy	df995aab56	Support filtering labels in dumper (#19018 )	2026-02-20 12:27:12 +08:00
fzyzcjy	261bca3c58	Support captured dump output and console output control in dumper (#19017 )	2026-02-20 12:26:24 +08:00
fzyzcjy	fc1500adc6	Hint users when wrongly execute it with partial ranks in dumper (#19014 )	2026-02-20 12:25:54 +08:00
fzyzcjy	b41d412c3d	Support cleanup previous dumps in dumper (#19013 )	2026-02-20 12:25:21 +08:00
Cheng Wan	13a4a0406e	Fix flashinfer autotune to only wrap run_once() (#19004 )	2026-02-19 20:02:21 -08:00
Cheng Wan	64bca5315f	Fix long prompt KV allocation by falling back to torch native APIs when exceeding Triton tensor limit (#18250 )	2026-02-19 19:15:05 -08:00
Nicolas Castet	99df920cdb	Register tensors with symmetric memory for qwen (#18643 )	2026-02-20 09:32:32 +08:00
Cheng Wan	73a7f0d049	Revert "Add SDAR model support" (#19032 )	2026-02-19 16:03:56 -08:00
Liangsheng Yin	db34c1cbfb	Tiny remove duplicate coredump env injection (#19023 )	2026-02-19 13:26:30 -08:00
Liangsheng Yin	5ff5aa6923	[spec v2]Fix torch gc of future indices (#18958 )	2026-02-19 11:38:25 -08:00
chengshuang18	44ab752b7a	Add SDAR model support (#18318 ) Co-authored-by: root <root@gpu-lg-cmc-h-h200-3047.host.h.pjlab.org.cn> Co-authored-by: chengshuang <chengshuang@pjlab.org.cn> Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>	2026-02-19 11:20:32 -08:00
Mick	3207427d6d	[diffusion] CI: enable warmup as default (#19010 )	2026-02-19 23:27:23 +08:00
Mick	d73f06f091	[diffusion] chore: improve memory usage on consumer-level GPU (#18997 )	2026-02-19 21:59:49 +08:00
satyamk7054	963def7f26	Move lora request validation to tokenizer_manager from server (#18962 ) Co-authored-by: Satyam Kumar <satyamk@linkedin.com>	2026-02-19 21:03:19 +08:00
Makcum888e	d07e8aa4a3	[Diffusion] [NPU] Enable profiler on NPU (#17807 )	2026-02-19 15:33:51 +03:00
Prozac614	e21fc78dbd	[diffusion] fix: fix rank used in parallel executor when enable_cfg_parallel is false (#18975 ) Co-authored-by: daiweitao <dwti614707404@163.com>	2026-02-19 20:12:24 +08:00
Xiaoyu Zhang	19aa19b111	[diffusion] refactor: refactor diffusion triton kernels (#18966 )	2026-02-19 17:03:44 +08:00
pansicheng	48642d5384	[RadixTree][4/N Refactor]: Move available_and_evictable_str to individual radix cache classes (#17852 )	2026-02-19 17:03:15 +08:00
shaharmor98	82a0bafc1c	Feat/add fi selective state update kernel call (#18070 ) Signed-off-by: Shahar Mor <smor@nvidia.com>	2026-02-19 16:56:06 +08:00
Yuwei An	0be30d4b0d	Fix PCG MoE Error (#17739 )	2026-02-19 16:48:06 +08:00
hlu1	bba2fc49a1	[Qwen3.5] Enable nvfp4 checkpoint (#18937 )	2026-02-19 12:24:05 +08:00
hxie	443b1a88d1	Add batched zero copy to NIXL backend (#18850 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2026-02-18 16:31:02 -08:00
Bingxu Chen	462267982b	[AMD] Fix mi35x dsv32 mtp nightly (#18978 ) Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>	2026-02-18 16:23:17 -08:00
Alison Shao	e2fccb2ee0	Fix flaky Qwen3-Next KL divergence tests by reverting mamba slot release (#18910 )	2026-02-19 07:55:16 +08:00
Mengyang Liu	4f980f6f23	[Feature] Implement update_weights_from_disk for SGLang-D (Diffusion … (#18306 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2026-02-18 11:24:07 -08:00
Tamir Baydasov	150ed881be	[4/N] Quantization Refactor: Quark MoE schemes (#18252 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Peng Zhang <aniz1905@gmail.com> Co-authored-by: ronnie_zheng <zl19940307@163.com>	2026-02-18 19:44:30 +03:00
Ethan (Yusheng) Su	9c5aae4df5	[Fix] Add lora tied lm head support (for Qwen2.5, Gemma, etc model need) (#18634 )	2026-02-19 00:34:51 +08:00
Yuhao Yang	5a7ae059e3	Add DP ViT support for Kimi K2.5 (#18689 )	2026-02-18 23:03:07 +08:00
zijiexia	eb6ff4a940	[diffusion] fix: refactor task resolution logic in benchmark function for multimodal generation (#18948 )	2026-02-18 20:57:42 +08:00
Xiaoyu Zhang	390c154306	[Tiny fix] Super tiny fix mul_add naive forward bug (#18964 )	2026-02-18 16:18:43 +08:00
Xiaoyu Zhang	513c12d23f	Remove unused fast-hadamard-transform PyTorch extension sources (#18927 )	2026-02-18 15:51:07 +08:00
Mick	420a611275	[diffusion] refactor: unify SamplingParams construction and improve DiffGenerator return types (#18928 )	2026-02-18 14:56:58 +08:00
DarkSharpness	9d138685c1	[Refactor] Fix test and clean up hicache code (#18555 )	2026-02-18 14:37:46 +08:00
William Arnold	95c44cea29	[feat] Add return_routed_experts param to async_generate for parity with generate (#18508 )	2026-02-17 22:11:19 -08:00
Neal Vaidya	ac0e493329	feat: add nsa and swa disagg support with nixl (#18939 ) Signed-off-by: Neal Vaidya <nealv@nvidia.com>	2026-02-18 13:13:26 +08:00

1 2 3 4 5 ...

6403 Commits