sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-03 13:57:04 +00:00

Author	SHA1	Message	Date
Nicolas Castet	51b3ed02ca	Fix bug in symm mem pre-allocation default (#19082 )	2026-02-21 11:51:28 +08:00
Vladislav Nosivskoy	afd91e8782	[DSv32] Fix MTP and CP compatability (#19062 ) Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>	2026-02-21 11:10:34 +08:00
Lianmin Zheng	463baafe10	[Auto Sync] Update batch_invariant_ops.py (20260221) (#19098 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Jiayi Yuan <34369239+jy-yuan@users.noreply.github.com>	2026-02-20 18:27:31 -08:00
Lianmin Zheng	2928dfb8fa	[Auto Sync] Update bench_one_batch_server_internal.py (20260221) (#19097 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>	2026-02-20 18:19:14 -08:00
Cheng Wan	84c67c8be0	Refactor graph input buffers (#18991 )	2026-02-20 18:09:31 -08:00
Minglei Zhu	4bffd3a232	[GPT-OSS] support fp8 online quantization for gpt-oss bf16 (#18988 ) merge it as all required CI passed	2026-02-20 14:16:57 -08:00
Qiaolin Yu	96bae2355e	Add generated-shared-prefix dataset in bench_one_batch (#18986 )	2026-02-20 13:33:10 -08:00
0xNullPath	ab18734375	[feat] feat: support swa in trtllm_mha (#18970 )	2026-02-21 01:39:29 +08:00
billishyahao	fbb6098487	[AMD] support two batch overlapping for mori ep (#17953 ) Co-authored-by: kkHuang-amd <wunhuang@amd.com> Co-authored-by: Feiyue Zhai <feiyue.zhai@amd.com> Co-authored-by: Duyi-Wang <duyi.wang@amd.com> Co-authored-by: HAI <hixiao@gmail.com>	2026-02-20 08:45:55 -08:00
Cheng Wan	38ee749dd9	Fix adjust_num_token_non_padded_for_attn_tp returning CPU tensor (#19051 )	2026-02-20 23:23:38 +08:00
Nicolas Castet	3358ba8945	[Fix] Run FlashInfer autotune on non-default stream for NCCL 2.29+ compatibility (#18987 )	2026-02-20 23:21:38 +08:00
DarkSharpness	52852404c8	[Fix] DO NOT skip save_kv_cache for dllm (#19020 )	2026-02-20 23:20:29 +08:00
Mohammad Miadh Angkad	f23a23cc05	Fix NSA FP8 KV cache path for both-trtllm MHA one-shot (#18931 ) Co-authored-by: rainj-me <96632942+rainj-me@users.noreply.github.com>	2026-02-20 22:00:09 +08:00
Mick	8d789b5c3d	[diffusion] feat: support nunchaku for Z-Image-Turbo and flux.1 (int4) (#18959 )	2026-02-20 21:16:08 +08:00
Yuan Luo	7d953440ec	[jit kernel] Support per_token_group_quant_8bit jit kernel (#18905 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-02-20 21:01:05 +08:00
Mick	38a69652e6	[diffusion] logging: log available mem when each stage starts in debug level (#18998 )	2026-02-20 19:57:06 +08:00
fzyzcjy	0d20cf5a66	Fix lint on main (#19054 )	2026-02-20 15:45:24 +08:00
Cheng Wan	b59a22f781	fix lint on main (#19052 )	2026-02-20 15:30:57 +08:00
Duyi-Wang	b0786cdf94	[AMD] Replace msgpack with msgspec in MORI-IO (#19007 )	2026-02-19 23:04:15 -08:00
YAMY	8541b1118d	[Fix][Qwen3.5] Pass max_mamba_cache_size to mamba pool in disaggregation decode path (#19002 )	2026-02-20 14:31:26 +08:00
chengshuang18	295bc17576	Feature/sdar support (#19044 ) Co-authored-by: root <root@gpu-lg-cmc-h-h200-3047.host.h.pjlab.org.cn> Co-authored-by: chengshuang <chengshuang@pjlab.org.cn> Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>	2026-02-19 21:58:15 -08:00
fzyzcjy	046ef0aa35	Support using SGLang port in dumper (#19038 )	2026-02-20 12:30:24 +08:00
fzyzcjy	2fecc2c075	Support resetting and enhance HTTP endpoints for dumper (#19046 ) Co-authored-by: Yueming Yuan <112649537+yueming-yuan@users.noreply.github.com>	2026-02-20 12:29:09 +08:00
fzyzcjy	503bf3047a	Enhance configure and env parsing in dumper (#19034 )	2026-02-20 12:28:10 +08:00
fzyzcjy	df995aab56	Support filtering labels in dumper (#19018 )	2026-02-20 12:27:12 +08:00
fzyzcjy	261bca3c58	Support captured dump output and console output control in dumper (#19017 )	2026-02-20 12:26:24 +08:00
fzyzcjy	fc1500adc6	Hint users when wrongly execute it with partial ranks in dumper (#19014 )	2026-02-20 12:25:54 +08:00
fzyzcjy	b41d412c3d	Support cleanup previous dumps in dumper (#19013 )	2026-02-20 12:25:21 +08:00
Cheng Wan	13a4a0406e	Fix flashinfer autotune to only wrap run_once() (#19004 )	2026-02-19 20:02:21 -08:00
Cheng Wan	64bca5315f	Fix long prompt KV allocation by falling back to torch native APIs when exceeding Triton tensor limit (#18250 )	2026-02-19 19:15:05 -08:00
Nicolas Castet	99df920cdb	Register tensors with symmetric memory for qwen (#18643 )	2026-02-20 09:32:32 +08:00
Cheng Wan	73a7f0d049	Revert "Add SDAR model support" (#19032 )	2026-02-19 16:03:56 -08:00
Liangsheng Yin	db34c1cbfb	Tiny remove duplicate coredump env injection (#19023 )	2026-02-19 13:26:30 -08:00
Liangsheng Yin	5ff5aa6923	[spec v2]Fix torch gc of future indices (#18958 )	2026-02-19 11:38:25 -08:00
chengshuang18	44ab752b7a	Add SDAR model support (#18318 ) Co-authored-by: root <root@gpu-lg-cmc-h-h200-3047.host.h.pjlab.org.cn> Co-authored-by: chengshuang <chengshuang@pjlab.org.cn> Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>	2026-02-19 11:20:32 -08:00
Mick	3207427d6d	[diffusion] CI: enable warmup as default (#19010 )	2026-02-19 23:27:23 +08:00
Mick	d73f06f091	[diffusion] chore: improve memory usage on consumer-level GPU (#18997 )	2026-02-19 21:59:49 +08:00
satyamk7054	963def7f26	Move lora request validation to tokenizer_manager from server (#18962 ) Co-authored-by: Satyam Kumar <satyamk@linkedin.com>	2026-02-19 21:03:19 +08:00
Makcum888e	d07e8aa4a3	[Diffusion] [NPU] Enable profiler on NPU (#17807 )	2026-02-19 15:33:51 +03:00
Prozac614	e21fc78dbd	[diffusion] fix: fix rank used in parallel executor when enable_cfg_parallel is false (#18975 ) Co-authored-by: daiweitao <dwti614707404@163.com>	2026-02-19 20:12:24 +08:00
Xiaoyu Zhang	19aa19b111	[diffusion] refactor: refactor diffusion triton kernels (#18966 )	2026-02-19 17:03:44 +08:00
pansicheng	48642d5384	[RadixTree][4/N Refactor]: Move available_and_evictable_str to individual radix cache classes (#17852 )	2026-02-19 17:03:15 +08:00
shaharmor98	82a0bafc1c	Feat/add fi selective state update kernel call (#18070 ) Signed-off-by: Shahar Mor <smor@nvidia.com>	2026-02-19 16:56:06 +08:00
Yuwei An	0be30d4b0d	Fix PCG MoE Error (#17739 )	2026-02-19 16:48:06 +08:00
hlu1	bba2fc49a1	[Qwen3.5] Enable nvfp4 checkpoint (#18937 )	2026-02-19 12:24:05 +08:00
hxie	443b1a88d1	Add batched zero copy to NIXL backend (#18850 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2026-02-18 16:31:02 -08:00
Bingxu Chen	462267982b	[AMD] Fix mi35x dsv32 mtp nightly (#18978 ) Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>	2026-02-18 16:23:17 -08:00
Alison Shao	e2fccb2ee0	Fix flaky Qwen3-Next KL divergence tests by reverting mamba slot release (#18910 )	2026-02-19 07:55:16 +08:00
Mengyang Liu	4f980f6f23	[Feature] Implement update_weights_from_disk for SGLang-D (Diffusion … (#18306 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2026-02-18 11:24:07 -08:00
Tamir Baydasov	150ed881be	[4/N] Quantization Refactor: Quark MoE schemes (#18252 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Peng Zhang <aniz1905@gmail.com> Co-authored-by: ronnie_zheng <zl19940307@163.com>	2026-02-18 19:44:30 +03:00

1 2 3 4 5 ...

6412 Commits