sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-03 13:57:04 +00:00

Author	SHA1	Message	Date
Makcum888e	5f81ec1ad5	[Diffusion] Fix get model name when model local path end with "/" (#18918 )	2026-02-17 13:19:54 +03:00
Ratish P	f6cc02489f	[diffusion]: fix sparse video gen 2 backend being applied to cross-attention (#18900 ) Co-authored-by: ronnie_zheng <zl19940307@163.com>	2026-02-17 13:17:46 +03:00
HAI	b158f5d4a2	Revert "[AMD] Fix RotaryEmbedding crash on AMD/ROCm (regression from #17934 )" (#18922 )	2026-02-17 01:07:50 -08:00
billishyahao	899e2be7d0	[TBO] fix cuda graph intermittently becomes disabled bug (#18320 )	2026-02-16 22:18:57 -08:00
Michael	5e3103a787	[AMD] Fix RotaryEmbedding crash on AMD/ROCm (regression from #17934 ) (#18903 ) Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>	2026-02-17 12:59:40 +08:00
Mohammad Miadh Angkad	90a0d66e1e	[Tiny] Fix assert syntax warning in compressed_tensors_w4a4_mxint4_moe.py (#18899 )	2026-02-17 12:54:30 +08:00
Yilong Zhao	d5307ce022	[misc] adding metadata field in UpdateWeightFromDiskReqInput (#18821 )	2026-02-17 12:14:15 +08:00
triple-mu	26b2c63d03	[diffusion] operator: unify rotary embedding impl (#18164 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-17 12:02:48 +08:00
pansicheng	b21390f8f3	Adapt the Qwen2Model._update_causal_mask for transformers==4.57.1 (#18774 )	2026-02-17 10:20:41 +08:00
Ratish P	50ca24aebb	[diffusion]: fix scheduler crash on ZMQ messages with unexpected frame counts (#17890 ) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2026-02-17 09:45:05 +08:00
Frank Minors	1b659bcb08	Fix GLM-5 fused shared expert (#18804 ) Co-authored-by: FrankMinions <liuchen@shinemo.com> Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca> Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>	2026-02-16 19:50:39 +00:00
danielafrimi	0ff24159a5	Fix modelopt FP8 create weights (#18447 ) Signed-off-by: root <dafrimi@nvidia.com>	2026-02-17 00:59:50 +08:00
Tamir Baydasov	eba6af385d	[2/N] Quantization Refactor: Compressed tensors MoE schemes (#17503 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: ronnie_zheng <zl19940307@163.com> Co-authored-by: Peng Zhang <aniz1905@gmail.com>	2026-02-16 18:03:51 +03:00
Estrella-xx	1b3513a7e4	refactor FAKE transfer backend and remove --disaggregation-decode-enable-fake-auto parameter (#18345 )	2026-02-16 17:27:02 +03:00
Ratish P	c1d1337afc	[diffusion][Wan]: fix sparse attention backends being applied to cross-attention (#17596 )	2026-02-16 21:57:58 +08:00
Mohammad Miadh Angkad	b86c6491fa	[Perf] ~9.5x faster Blackwell MXFP4 MoE weight loading (#18858 )	2026-02-16 19:47:09 +08:00
Shivam jindal	4f0409f8aa	[Model] Add Qwen3ForRewardModel and fix Qwen3ForSequenceClassification (#17992 ) Co-authored-by: yes-its-shivam <yes-its-shivam@users.noreply.github.com>	2026-02-16 19:44:41 +08:00
Mick	de833f9e8e	Revert "[diffusion]: Improve layerwise offload buffer reuse and shared-storage handling" (#18866 )	2026-02-16 18:00:58 +08:00
Mick	d0c94e136a	[diffusion] logging: improve peak vram logging (#18865 )	2026-02-16 16:44:37 +08:00
Yi Zhong	ed22720c07	[JIT kernel] hd=512,1024 in JIT QK norm (cta based) (#17515 ) Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>	2026-02-16 16:07:24 +08:00
Alison Shao	206accd15d	Fix GLM-4V processor registration when glm_ocr is unavailable (#18885 )	2026-02-16 16:02:31 +08:00
Changyi Yang	61da34ad0b	[diffusion] fix: fix LoRA weight snapshot aliasing in unmerge logic (#18883 )	2026-02-16 15:39:45 +08:00
Alison Shao	86c181e335	Fix test_lora_qwen3 nightly failure: replace adapter with added_tokens (#18884 )	2026-02-16 14:35:06 +08:00
Douglas Yang	f1efb46bdd	fix: adding performance logging for nightly diffusion (#18023 )	2026-02-16 14:09:00 +08:00
fzyzcjy	f554b3c27b	Support dumping gradients, parameters, lazy values (#18881 ) Co-authored-by: Yueming Yuan <112649537+yueming-yuan@users.noreply.github.com>	2026-02-16 13:34:06 +08:00
fzyzcjy	9a7d8d5eb0	Collect upper level metadata to dump output (#18880 )	2026-02-16 13:31:19 +08:00
fzyzcjy	949792d0c6	Change dump output format to dict with value and metadata (#18879 )	2026-02-16 13:30:47 +08:00
fzyzcjy	02816abc0d	Flip dumper to disable by default and refactor environment handling (#18878 )	2026-02-16 13:29:32 +08:00
Duyi-Wang	5ddc84e33e	[AMD] MORI-EP inter kernel type switch (#18437 ) Co-authored-by: HAI <hixiao@gmail.com>	2026-02-15 20:59:39 -08:00
Johnsonms	bc79a64d3a	[Diff]: support SGLANG_TORCH_PROFILER_DIR environment variable for profiler log directory (#18454 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-16 12:47:29 +08:00
Mick	0af9dcc407	[diffusion] refactor: refactor server_args adjust and validate logics (#18863 )	2026-02-16 11:49:06 +08:00
Mick	78b4c9e248	[diffusion] fix: avoid saving output for warmup requests (#18867 )	2026-02-16 11:48:28 +08:00
Yuan Luo	8a82c70297	[VLM] Optimize Ernie4.5-VL rotary embedding with fused triton kernel (#18856 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-02-16 11:19:44 +08:00
Rain Jiang	0ffd0a3995	Nsa trtllm mla sparse fp8 support with Deepseek v3.2 NVFP4 (#18389 )	2026-02-16 09:29:54 +08:00
Mike Qiu	b79808bee2	Fix libnuma.so does not exsit (#15355 ) Signed-off-by: Michael Qiu <qiudayu.qdy@antgroup.com> Co-authored-by: Mike_Qiu <qiudayu.qdy@antgroup.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2026-02-16 00:37:50 +08:00
akhilg-nv	48eac1b62d	Improve profiler options for bench_serving (#16991 )	2026-02-16 00:36:01 +08:00
Chanh Nguyen	597d17dd18	Use ephemeral nccl port via get_free_port() (#18009 ) Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>	2026-02-16 00:32:47 +08:00
tjp_zju	7a607c4900	fix_get_quant_method_in_fused_moe_condition (#18459 ) Signed-off-by: tom-zju <tanjianpingzju1990@gmail.com> Co-authored-by: Peng Zhang <aniz1905@gmail.com>	2026-02-16 00:31:42 +08:00
WiwilZ	b2f74d660a	fix: add SM110 (Jetson AGX Thor) to Blackwell capability check (#18787 )	2026-02-16 00:26:58 +08:00
blake-snc	57f7e06cb9	fix: update Blackwell log/error messages to include SM12x (#18751 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 00:23:51 +08:00
SoluMilken	07a24f1a38	update pre-commit config (#18860 )	2026-02-16 00:18:31 +08:00
Ratish P	ddfe147377	[diffusion]: Improve layerwise offload buffer reuse and shared-storage handling (#18611 )	2026-02-15 22:17:51 +08:00
Mick	3feb48139e	[diffusion] quant: add support for svdquant and nunchaku (#18549 ) Co-authored-by: AichenF <aichenf@nvidia.com> Co-authored-by: jianyingzhu <53300651@qq.com>	2026-02-15 20:43:00 +08:00
Michael	88010e9601	[AMD] Fix nightly 1-GPU test failures and bench_serving regression (#18761 ) Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>	2026-02-15 20:36:47 +08:00
fzyzcjy	4c7f986c6b	Extract dumper and prefill delayer tests common utils (#18857 )	2026-02-15 18:33:23 +08:00
haowen-han	b992828ad2	fix: fix bug on kimi2.5 with dp2 and tp4 (#18604 ) Co-authored-by: hanhaowen <hanhaowen@baidu.com>	2026-02-15 16:32:13 +08:00
Ratish P	274bf6607a	[diffusion] fix: enable torch.compile for UlyssesAttention (#18840 )	2026-02-15 15:54:27 +08:00
zhangxiaolei123456	ad1bdb93df	perf: add minimax-2.5 fused_moe tuning config for h20 (#18833 )	2026-02-15 15:46:56 +08:00
jackey hua	922fbc21e2	[Perf] Tune MiniMax M2 fused moe kernel on H100 GPU (#18851 )	2026-02-15 15:30:52 +08:00
andyluo7	944a9f6fcf	Fix/qwen3 5 amd rope cutedsl fallback (#18753 ) Co-authored-by: seungrokj <seungrok.jung@amd.com>	2026-02-14 22:09:44 -08:00

1 2 3 4 5 ...

6437 Commits