sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-02 04:37:14 +00:00

Author	SHA1	Message	Date
Baizhou Zhang	776709efe8	[3/n] deepseek_v2.py Refactor: Migrate MLA forward method in deepseek_v2.py (#19122 )	2026-02-27 13:37:29 -08:00
wufann	7e46aafebb	[AMD] Enable cudagraph for aiter nsa backend and add aiter impl for nsa pr… (#18526 )	2026-02-27 13:18:32 -08:00
Shu Wang	1b75d0d1a9	Fix BatchMLAPagedAttentionWrapper query/qo_inptr mismatch for EAGLE (#15601 )	2026-02-27 11:35:45 -08:00
ishandhanani	6a1480ce45	Fix HiCacheNixl TypeError: mem_pool_host passed as file_path (#19517 )	2026-02-27 10:59:32 -08:00
Mohammad Miadh Angkad	35ef38c61b	Remove gpt-oss hybrid swa gate for trtllm_mha (#19079 )	2026-02-27 10:30:00 -08:00
Michael	1b79934d34	[AMD] Fix AMD CI test of TestToolChoiceLfm2Moe (#19113 ) Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com> Co-authored-by: bingxche <Bingxu.Chen@amd.com> Co-authored-by: yctseng0211 <yctseng@amd.com>	2026-02-27 10:18:15 -08:00
R0CKSTAR	fe4bc8ebd5	[diffusion] fix: MulAdd 4D path (shift indexing) (#18673 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2026-02-28 01:52:57 +08:00
Makcum888e	b1249ac909	[Diffusion] [NPU] [CI] fix CI performance (#19486 )	2026-02-27 18:23:02 +03:00
Yuan Luo	d2885a9094	[Qwen3-Next] Support gdn fused_rms_norm_gated (#19434 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-02-27 23:08:08 +08:00
joesun	ca5f2e2ed1	[diffusion] fix: Support default response_format=url in /v1/images/generations to avoid 400 errors when response_format is omitted (#19360 ) Co-authored-by: Makcum888e <79456407+Makcum888e@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-27 19:47:38 +08:00
AMD-yanfeiwang	f69ca93d49	[AMD] remove redundancy H2D op in aiter attention backend (#19416 ) Co-authored-by: root <root@mia1-p01-g28.mia.tensorwave.lan>	2026-02-27 02:15:55 -08:00
Yilong Zhao	07da4bed7b	[cache] add conservative estimation (#19482 )	2026-02-27 18:14:46 +08:00
fxmarty-amd	9496bbd7b1	[AMD] Use `tilelang` as default NSA attention backend dispatch on AMD Instinct (#18319 )	2026-02-27 01:43:34 -08:00
ympcMark	43fade5f69	[4/N] (Elastic EP) Back up Expert Weights in DRAM (#17374 ) Co-authored-by: UNIDY2002 <unidy2002@outlook.com>	2026-02-27 15:59:13 +08:00
KnightLTC	eef44ec916	[NPU]kimi k2 thinking bugfix (#19387 ) Co-authored-by: sglang-npu-bot <sglangnpu@163.com>	2026-02-27 15:36:38 +08:00
Cao E	4f0f6cd9d0	Add torch.compile support for qwen3-next on CPU (#12444 )	2026-02-26 23:28:03 -08:00
KnightLTC	bc9190435b	llama4 npu adapt (#17123 ) Co-authored-by: cy <chenyang08056032@163.com> Co-authored-by: sglang-npu-bot <sglangnpu@163.com>	2026-02-27 14:52:37 +08:00
Ken J	f0c2089597	[vlm][internVL] Support processor and embedding inputs for InternVL (#19127 )	2026-02-26 22:46:48 -08:00
triple-mu	8a56cc5836	[diffusion] model: Fix a performance bug in the Wan model about usp (#19340 )	2026-02-27 14:36:23 +08:00
elvischenv	af3eccc9ab	[Perf] Eliminate the slice op for Flashinfer `trtllm_fp4_block_scale_moe` (#15731 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2026-02-26 22:23:40 -08:00
wingedge777	d566816d83	fix qwen3_vl visual module loading (#19333 )	2026-02-26 19:44:05 -08:00
Alison Shao	52c8a3632a	Fix missing StandardCombineInput import in BF16 flashinfer_trtllm MoE (#19400 ) Co-authored-by: Alison Shao <alisonshao@MacBook-Pro-D2W773R9CD.local>	2026-02-26 18:32:08 -08:00
Alison Shao	c2dce06d9f	Fix parallel tool call test for speculative decoding variants (#19370 ) Co-authored-by: Alison Shao <alisonshao@Mac.attlocal.net> Co-authored-by: Alison Shao <alisonshao@MacBook-Pro-D2W773R9CD.local>	2026-02-26 18:31:20 -08:00
littleyellowbicycle	5b5c509480	[NPU][feature adapt]remote load weight feature adp npu (#17968 ) Co-authored-by: littleYellowBicycle <liguo29@huawei.com> Co-authored-by: sglang-npu-bot <sglangnpu@163.com>	2026-02-27 10:10:36 +08:00
Yizhong Cao	e567215e44	[bugfix]fix fa4 decoding (#19388 ) Co-authored-by: caoyizhong.cyz <caoyizhong.cyz@alibaba-inc.com>	2026-02-27 09:48:21 +08:00
Sam (Kesen Li)	5194eef88b	[Fix] A followup fix for TRTLLM BF16 MoE (#15303 )	2026-02-26 17:14:20 -08:00
billishyahao	f0d78f2e20	[AMD] Fix weight load shape mismatch for amd dsr1 0528 mxfp4 (#19425 )	2026-02-26 17:11:59 -08:00
fzyzcjy	4add6ec0f6	Enhance displaying and debuggability in dump comparator (#19466 )	2026-02-27 09:06:19 +08:00
fzyzcjy	23cbbd6d41	Integrate packed data context parallel in dump comparator (#19464 )	2026-02-27 08:28:36 +08:00
Liangsheng Yin	1a32c0db4d	[Minor] Rename misleading `chunked` to `reusing` in `ReqToTokenPool.alloc()` (#19465 )	2026-02-26 16:24:03 -08:00
fzyzcjy	8293a914a6	Support token align with packed CP data in dump comparator (#19463 )	2026-02-27 08:12:54 +08:00
fzyzcjy	695e93b91f	Make reorderer support packed format with CP in dump comparator (#19462 )	2026-02-27 08:12:18 +08:00
fzyzcjy	e3cdf6b1a3	Support CP packed format in unsharder in dump comparator (#19461 )	2026-02-27 08:11:47 +08:00
fzyzcjy	f2d1b7c4ad	Support sequence parallel and trivial dims in dump comparator (#19460 )	2026-02-27 08:11:15 +08:00
fzyzcjy	eb2ada3804	Support non-packed format when aligning tokens in dump comparator (#19459 )	2026-02-27 08:10:29 +08:00
fzyzcjy	e1e0cfd856	Use named tensors in dump comparator (#19458 )	2026-02-27 08:09:55 +08:00
fzyzcjy	eb0e905fc3	Update token layout and cleanup printer in dump comparator (#19457 )	2026-02-27 08:09:15 +08:00
fzyzcjy	8ac64e1487	Support unifying axis ordering in dump comparator (#19456 )	2026-02-27 08:08:32 +08:00
fzyzcjy	425d333ee3	Support token dim in arbitrary location in dump comparator (#19455 )	2026-02-27 08:07:38 +08:00
Thomas Wang	5172c37845	[AMD] Use fused GEMM with FP8 cast for FP8 prefill (#19422 )	2026-02-26 15:14:52 -08:00
RoyWang	a1ef8e2cc0	[AMD] optimize Kimi K2.5 fused_moe_triton performance by tuning (#19228 )	2026-02-26 11:50:13 -08:00
Shangming Cai	288300aafd	[PD] Tiny code cleanup for prefill info registering (#19414 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2026-02-27 01:35:26 +08:00
zhangheng	e4b708d3e9	[Spec V2] Support specV2 for mamba hybrid attention (#18808 ) Co-authored-by: Yi Zhong <207368749+vincentzed@users.noreply.github.com> Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: Hanming Lu <hanming@x.ai>	2026-02-27 00:36:01 +08:00
DefTruth	78d6674c45	[diffusion] feat: support hybrid parallelism for diffusers backend (#19405 )	2026-02-27 00:06:08 +08:00
Shangming Cai	e55e65535e	[Bugfix] Add rids to the batch filtering for two batch overlap (#19418 )	2026-02-26 06:57:25 -08:00
Shangming Cai	97f1fa5e6b	[NPU] Fix disaggregation metadata buffer bootstrap_room_dtype for npu backend (#19423 )	2026-02-26 21:10:50 +08:00
khalilzhk	86eb80007e	[NPU] support Kimi-K2.5 on NPU (#19331 )	2026-02-26 20:41:44 +08:00
AlfredYong	bdc1e46e5a	[Qwen3.5] Qwen3.5-27B inference repeat bug fix (#19411 )	2026-02-26 20:11:29 +08:00
Xiaoyu Zhang	74c8e7b215	refactor(jit_kernel): reduce duplication and separate test code (#19323 )	2026-02-26 18:30:49 +08:00
Junhao Liu	a7152df2e3	[diffusion ] CLI: Fix typo in CLI usage doc string (#19316 )	2026-02-26 13:24:14 +03:00

... 24 25 26 27 28 ...

7855 Commits