sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 12:17:09 +00:00

Author	SHA1	Message	Date
Mick	6cc5717e8a	[diffusion] doc: update quantization.md (#21356 )	2026-03-25 14:48:38 +08:00
Alison Shao	17e41cfb21	Fix RDMA device mapping for non-zero GPU indices in disaggregation tests (#21303 ) Co-authored-by: Alison Shao <alison.shao@MacBook-Pro-D2W773R9CD.local> Co-authored-by: hnyls2002 <lsyincs@gmail.com>	2026-03-24 22:56:57 -07:00
Duyi-Wang	61a902ce88	[AMD][MoRI] Auto-select dispatch quantization type from MoE weight dtype. (#21040 )	2026-03-24 22:53:57 -07:00
kk	86e2622097	[AMD] Add mha fp8-kv support (#21253 ) Co-authored-by: wunhuang <wunhuang@amd.com>	2026-03-24 22:38:02 -07:00
Baizhou Zhang	2b75fed0dd	Workaround of DSA performance drop on B200 + DP (#21337 )	2026-03-24 22:21:07 -07:00
Ke Bao	92492896a5	Fix disaggregation test bootstrap port conflict (#21271 )	2026-03-24 21:14:41 -07:00
Ke Bao	c1d930c028	Increase flush cache timeout in hicache CI (#21305 )	2026-03-24 19:00:59 -07:00
Yuan Luo	f273ba1ccc	[KDA] Support CuTeDSL KDA decode kernel (#21203 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-03-25 09:47:09 +08:00
DarkSharpness	dfc15b78b0	[misc] clean up kernel API (#21325 )	2026-03-25 09:10:23 +08:00
ykcai-daniel	281fe10b5e	[diffusion] quant: support nvfp4 for Flux.2 (#20137 ) Co-authored-by: zcnrex <zcnrex@gmail.com> Co-authored-by: BBuf <1182563586@qq.com> Co-authored-by: Yikang Cai <dcai@catalyst-fleet1.cs.cmu.edu> Co-authored-by: CHEN Xi <78632976+RubiaCx@users.noreply.github.com> Co-authored-by: RubiaCx <1084281732@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-25 08:28:25 +08:00
Liangsheng Yin	37420dce0b	[CI] Enable failfast (`-f`) by default in `run_suite.py` (#21330 )	2026-03-24 17:04:42 -07:00
Baizhou Zhang	1046dbe038	[Fix] Fix trtllm fp4 moe kernel not found error (#21343 )	2026-03-24 16:38:05 -07:00
Mohammad Miadh Angkad	bbe25b2412	Use FlashInfer tinygemm for GPT-OSS MoE router on SM90+ (#20755 ) Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2026-03-24 15:00:18 -07:00
Jiaxin(Jackson) Deng	c4db64c16b	Add Lychee Doc Links Check to Local and CI (#19742 ) Co-authored-by: Zijie Xia <zijie_xia@icloud.com> Co-authored-by: Zijie Xia <zijiexia@users.noreply.github.com> Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>	2026-03-24 13:48:26 -07:00
Jonah Bernard	a32e0d57e7	[LoRA][III] Add LoRA support for MoE layers and enable TP (#14105 ) Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-03-24 13:14:14 -07:00
Zhang Yiyang (SII)	a3ed2e4d29	[diffusion][CI] Add CI for MOVA model inference (#20430 ) Co-authored-by: Luo <139519292+0-693@users.noreply.github.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-24 21:28:16 +03:00
YC Yen-Ching Tseng	71f5ae3f9a	[AMD] Fix AMD Nightly Test - Transformers 5.3.0 incompatibility and gemma2-27b kv issue (#21193 ) Co-authored-by: bingxche <Bingxu.Chen@amd.com>	2026-03-24 10:41:44 -07:00
Elizaveta Martirosian	9f4d8ac99f	[Diffusion][NPU] Add support for Hunyuan3D (#20352 ) Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com>	2026-03-24 16:18:49 +03:00
shadowxz109	1b4933d45d	[NPU][ModelSlim] adapt w2 quant layer for Minimax2.5 (#20905 )	2026-03-24 20:57:18 +08:00
Aleksi Vesanto	eefb504f84	[diffusion] model: Fix FLUX.1 output correctness (#21041 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-24 15:17:33 +03:00
Mohammad Miadh Angkad	4fbb311234	[Fix][Eval] Keep `--dataset-path` scoped to `longbench_v2` (#21156 )	2026-03-24 02:25:11 -07:00
Thomas Wang	855d15adf6	[AMD] Tilelang sparse fwd for dsv32 mi355/mi300 (#19945 )	2026-03-24 02:01:39 -07:00
Shunkangz	dac148167c	Enable the qwen3 test (#21195 ) Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2026-03-23 23:40:59 -07:00
Xiaoyu Zhang	69f02e36e8	[diffusion] Fix torch.zeros typo in causal wan (#21250 )	2026-03-24 14:39:16 +08:00
Xiaoyu Zhang	d9f97b2115	Refine diffusion skills and align JIT kernel docs with the new CI flow (#21283 )	2026-03-24 14:38:36 +08:00
Cheng Wan	c01ee848b0	Revert "fix: use consistent time denominator for throughput metrics in bench_one_batch_server" (#21276 )	2026-03-23 22:14:54 -07:00
Baidu-AIAK	6491728797	[Perf] Overlap NSA-CP key all-gather with query computation for DeepSeek-V3.2 (#20438 ) Co-authored-by: Shurui Jia <18817781975@163.com> Co-authored-by: Baidu-AIAK <baiduaiak~123>	2026-03-23 21:31:48 -07:00
Lianmin Zheng	260abe1fb1	Refactor JIT kernel CI to use run_suite.py registration system (#21239 )	2026-03-23 21:17:27 -07:00
hzh0425	0986bed8e2	[HiCache][HybridModel]: Support mamba state offloading & HybridCacheController (#20457 ) Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: ispobock <ispobaoke@gmail.com>	2026-03-23 20:02:50 -07:00
Ratish P	2b1d3c935e	[diffusion] fix Z-Image SP sharding for portrait and padded resolutions (#21042 ) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2026-03-24 10:15:33 +08:00
Yuxuan Zhang	fcaad42b00	[Bug Fix] GLM-V / GLM-OCR: field detection for transformers 5.x and MTP omission fix (#21134 )	2026-03-23 13:19:48 -07:00
Baizhou Zhang	ed316a26ef	Fix CP in-seq-split method for DeepSeek V32 and update related tests (#21192 )	2026-03-23 12:34:10 -07:00
Lianmin Zheng	27ac831a84	docs: improve CI and testing documentation (#21202 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 10:48:50 -07:00
jacky.cheng	b4d3fb001d	[AMD] Add fused GemmaRMSNorm forward_hip to use aiter/vllm kernels for qwen3.5 (#21188 )	2026-03-23 10:21:36 -07:00
Johnsonms	777edb6ef7	Fix(jit): support rmsnorm for hidden_size in {64, 128, 256} (#20661 )	2026-03-23 23:17:44 +08:00
Yuan Luo	5bdc07d974	[Qwen3.5] Fuse split/reshape/cat ops in GDN projection with Triton kernel (#21019 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-03-23 23:17:01 +08:00
McZyWu	8662ba7db4	[NPU] bugfix for import sgl-kernel error (#21200 )	2026-03-23 19:52:36 +08:00
strgrb	80d4a0753a	fix fused_set_kv_buffer for rope with Ling-v2 (#20316 )	2026-03-23 19:20:40 +08:00
McZyWu	4641e5a3d2	[NPU] enhance accuracy for model minimaxm2 from 16.5% to 95.5% (#17695 )	2026-03-23 19:06:38 +08:00
XDaoHong	2d288ba8c9	[Bugfix] fix npu get kv_item_lens in PD separation when use ASCEND_US… (#15852 ) Co-authored-by: ZhengdQin <zhengdqin@gmail.com>	2026-03-23 15:56:47 +08:00
kpham-sgl	59cb9a9da6	[Spec][Ngram] 3/N: Fix synchronization issues in `Ngram.cpp` (#21186 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 00:37:07 -07:00
Lianmin Zheng	814202704b	ci: unify PR test suite naming (#21187 )	2026-03-23 00:18:45 -07:00
yudian0504	3d312643b9	[BUGFIX] Fix CP residual size mismatch crash when tp_size == attn_cp_size (#21170 )	2026-03-23 00:12:58 -07:00
Lianmin Zheng	7757a9ddd0	ci: remove IS_BLACKWELL env var; auto-detect Blackwell (#21118 )	2026-03-22 23:44:48 -07:00
kpham-sgl	bc4aaab6a1	[Spec][Ngram] 2/N: Rename branch length to max trie depth (#21181 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 23:35:25 -07:00
Cheng Wan	d6b12c401c	Revert "[bugfix] Fix PPMissingLayer AttributeError when Using PP" (#21189 )	2026-03-22 23:28:36 -07:00
Zhiqiang Xie	13f4f010d8	HiSparse for Sparse Attention (#20343 )	2026-03-22 23:09:31 -07:00
Lianmin Zheng	7050011dee	Enable JIT clamp_position and resolve_future_token_ids on ROCm (#21116 )	2026-03-22 22:33:54 -07:00
Yuhao Yang	32a85ef128	[diffusion] CI: auto-skip diffusion tests when required pipeline class is missing from diffusers (#21139 )	2026-03-23 12:15:21 +08:00
Mohammad Miadh Angkad	d8a5b1dbaf	[Bugfix] Work around FlashInfer unified transport issue on GB (#20039 )	2026-03-22 21:10:25 -07:00

1 2 3 4 5 ...

7221 Commits