sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 20:27:57 +00:00

Author	SHA1	Message	Date
赵晨阳	62480ebb1b	[SGLang-Diffusion] Fix custom op fake impl missing eps default for torch.compile (#19725 ) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2026-03-03 15:24:36 +08:00
1StepForever	3c01b44700	[Fix] NPU deepep hccl buffer and fix IPC safe check (#17804 )	2026-03-03 14:56:06 +08:00
Xinyuan Tong	dbf1247fe0	Add KimiK2Detector with tool interruption support (#19696 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2026-03-03 14:04:49 +08:00
Yuzhen Zhou	63003a39cf	[BUG] Support tuple hidden_states from fused MXFP4/FP8 quantization (#19643 )	2026-03-02 20:39:06 -08:00
Alison Shao	fe9d85d93c	Fix CompressedTensorsMxInt4MoE abstract method and relax GPQA baseline (#19726 ) Co-authored-by: Alison Shao <alisonshao@Mac.attlocal.net>	2026-03-02 19:03:21 -08:00
huangtingwei	8dfb6e1684	[HiCache] fix compatibility bugs with eagle and HiCacheStorage (#19570 ) Co-authored-by: maoyuhan <susanmao1997@outlook.com>	2026-03-02 18:29:51 -08:00
KnightLTC	1041f240c0	[NPU]grok2 model support (#17119 ) Co-authored-by: cy <chenyang08056032@163.com>	2026-03-03 10:24:10 +08:00
Ratish P	e6e02ec938	[diffusion]: Add model detectors and warning for quantized diffusion models (#18041 )	2026-03-03 09:46:25 +08:00
Xiaoyu Zhang	145ae518ac	[Diffusion] Revert 18619 (#19510 )	2026-03-03 08:15:15 +08:00
Mohammad Miadh Angkad	6822941514	[FlashInfer] Bump FlashInfer version from 0.6.3 to 0.6.4 (#19005 )	2026-03-02 16:12:09 -08:00
Mohammad Miadh Angkad	3f9fc8b848	[Qwen3.5] Fix missing `quant_config` in `Qwen3VL` (#19291 )	2026-03-02 14:07:51 -08:00
Glen Liu	cc860a2198	[TestFix] change LoRA tests to use NVIDIA adapter instead of Nutanix (#19642 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-02 12:55:41 -08:00
Xiaoyu Zhang	51ee17ce44	[diffusion] move skills dir (#19697 )	2026-03-03 02:51:29 +08:00
Makcum888e	05950853bc	[Diffusion] [NPU] Add CI tests for FLUX (#19001 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: ronnie_zheng <zl19940307@163.com>	2026-03-02 20:40:22 +03:00
Yuwei An	c64274c746	Piecewise Cuda Graph set default (#16331 )	2026-03-02 23:18:07 +08:00
hlu1	468e3dc56b	[Qwen3.5] Set full attn_backend to trtllm_mha on SM100 by default when possible (#19030 )	2026-03-02 23:14:53 +08:00
0xNullPath	2d183c4e6d	[Feat] add PP Support for minimax-m2 series (#19577 )	2026-03-02 23:13:59 +08:00
Ruihang Li	5833ea684d	[diffusion] fix: make input/output file save paths configurable and disableable (#19580 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-02 23:02:33 +08:00
Xiaoyu Zhang	53de53fb53	[jit_kernel] Tiny unify jit_kernel tests style (#19694 )	2026-03-02 21:33:59 +08:00
Hexq0210	714c53d609	[NPU] support PD disaggregation on ascend when using PP (#14908 ) Co-authored-by: iridiumine <42236072+iridiumine@users.noreply.github.com>	2026-03-02 21:33:16 +08:00
Bi Xue	eaf18ebe8d	[sgl]add pin_mem to avoid cpu->gpu copy sync point (#19590 )	2026-03-02 21:08:31 +08:00
JiaruiChang5268	b3718982a1	[Feature] add feature mla_ag_after_qlora for dsv3.2 (#19428 ) Co-authored-by: JiaruiChang5268 <changjiarui1@huawei.com>	2026-03-02 20:00:31 +08:00
Shangming Cai	3f36f27eae	[Bugfix] Fix nixl and mori backend for missing decode tp size in PD module (#19690 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2026-03-02 19:55:19 +08:00
AichenF	8df9b8dce9	[diffusion] fix: skip USP for cross-attention with replicated KV for wan (#19419 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-02 19:52:08 +08:00
Leoyzen	da2a0240f7	Add GLM45 tool interruption support (#17714 ) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-02 19:34:12 +08:00
fzyzcjy	7579ab3f33	Enhance error resilience in dump comparator (#19685 )	2026-03-02 19:08:35 +08:00
fzyzcjy	e5ef845cad	Support multiple verbosity in dump comparator (#19684 )	2026-03-02 18:47:30 +08:00
fzyzcjy	3dd4649b42	Beautify text output in dump comparator (#19683 )	2026-03-02 18:47:01 +08:00
fzyzcjy	5bf3deb4bc	Trace execution information in dump comparator (#19682 )	2026-03-02 18:46:27 +08:00
fzyzcjy	abdc0ee71f	Support directory detection in dump comparator (#19680 )	2026-03-02 18:45:35 +08:00
fzyzcjy	6980416149	Support non orthogonal parallel axes and explicit replication annotation in dump comparator (#19679 )	2026-03-02 18:44:33 +08:00
fzyzcjy	a70dd11011	Support flattened dims in dump comparator (#19678 )	2026-03-02 18:43:01 +08:00
fzyzcjy	15e83eea61	Enhance replication check, matching pattern, logging in dump comparator (#19677 )	2026-03-02 18:42:27 +08:00
fzyzcjy	ec44bc82ab	Support presets and arbitrary skipping keys in dump comparator (#19676 )	2026-03-02 18:41:49 +08:00
Mick	2e15c015c0	[diffusion] feat: Add --model-id for config resolution; deprecate model_detectors (#19607 )	2026-03-02 16:39:53 +08:00
kk	15af26d1e8	Add aiter attention support in prefill-attention-backend of gpt-oss (#18282 ) Co-authored-by: wunhuang <wunhuang@amd.com>	2026-03-01 23:39:24 -08:00
ishandhanani	f7da379b61	feat: TTL-based prefix pinning with refresh-on-hit for HiRadixCache (#18941 ) Co-authored-by: Claude <noreply@anthropic.com>	2026-03-01 23:27:22 -08:00
Leon Gao	07ef5f7be1	Remove sync points in mamba cache + prefill cudagraph plumbing for DP (#19639 )	2026-03-02 15:03:42 +08:00
Baidu-AIAK	922aad2faa	Cleanup disagg decode prebuilt flow and add cross-stream sync in merge_batch (#19568 ) Co-authored-by: vincent <vincent@vincentdeMacBook-Pro.local> Co-authored-by: hnyls2002 <lsyincs@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2026-03-01 21:52:27 -08:00
Prozac614	57c5c343d7	[diffusion] model: support Hunyuan3D-2 (#18170 ) Co-authored-by: yingluosanqian <yingluosanqian@gmail.com> Co-authored-by: daiweitao <dwti614707404@163.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-02 12:28:05 +08:00
Yuan Luo	f6ee6dc8c3	[JIT-kernel] Add unit test for nsa indexer fused_store_k_cache (#19389 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-03-02 12:18:11 +08:00
Shangming Cai	0a6678bf3a	[PD] Remove unused server args for disaggregation (#19618 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2026-03-02 11:38:50 +08:00
Henry	e5edf222cd	[WIP]enable mxfp8 on nvidia sm120 (#19112 ) Co-authored-by: Your Name <you@example.com>	2026-03-01 19:06:43 -08:00
SoluMilken	20282f5664	[fix typo] expert_indicies -> expert_indices (#19627 ) Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>	2026-03-01 17:37:34 -08:00
zwang86	f51ddba131	feat: add FA4 SM90 paged KV decode support & update attention docs (#18442 ) Co-authored-by: Zeyu Wang <zeyu.wang@yahooinc.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2026-03-02 09:12:19 +08:00
Kangyan-Zhou	98224de29b	[Bugfix] Add missing auto_create_handle_loop to communicator methods (#19610 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 15:00:05 -08:00
SoluMilken	0b3ddbcf10	[fix typo] seperated_timestep -> separated_timestep (#19622 )	2026-03-01 14:09:51 -08:00
Kangyan-Zhou	dc02e5bea7	[HiCache] Re-land spec v2 + decode KV cache offloading compatibility (#19615 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 13:58:31 -08:00
Ziang Li	0e86977811	[RL] Support per-layer mixed FP8/BF16 serving for FP8 checkpoints (#18742 )	2026-03-01 21:59:22 +08:00
Mick	a75840b373	[diffusion] CI: create and refactor UT (#19619 )	2026-03-01 19:38:20 +08:00

... 22 23 24 25 26 ...

7855 Commits