sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-03 22:07:12 +00:00

Author	SHA1	Message	Date
Mick	6503f94211	[diffusion] feat: support passing component path via server args (#19108 )	2026-02-21 21:22:47 +08:00
Mick	b89ca65789	[diffusion] refactor: reduce redundancy and improve stage api (#19060 )	2026-02-21 16:35:47 +08:00
赵晨阳	e239f8aa85	Remove error dllm and diffusion doc in basic_useage (#19105 )	2026-02-20 20:28:00 -08:00
billishyahao	fbb6098487	[AMD] support two batch overlapping for mori ep (#17953 ) Co-authored-by: kkHuang-amd <wunhuang@amd.com> Co-authored-by: Feiyue Zhai <feiyue.zhai@amd.com> Co-authored-by: Duyi-Wang <duyi.wang@amd.com> Co-authored-by: HAI <hixiao@gmail.com>	2026-02-20 08:45:55 -08:00
chengshuang18	295bc17576	Feature/sdar support (#19044 ) Co-authored-by: root <root@gpu-lg-cmc-h-h200-3047.host.h.pjlab.org.cn> Co-authored-by: chengshuang <chengshuang@pjlab.org.cn> Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>	2026-02-19 21:58:15 -08:00
Cheng Wan	73a7f0d049	Revert "Add SDAR model support" (#19032 )	2026-02-19 16:03:56 -08:00
chengshuang18	44ab752b7a	Add SDAR model support (#18318 ) Co-authored-by: root <root@gpu-lg-cmc-h-h200-3047.host.h.pjlab.org.cn> Co-authored-by: chengshuang <chengshuang@pjlab.org.cn> Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>	2026-02-19 11:20:32 -08:00
Mohammad Miadh Angkad	2f592c3b18	[Doc] Add `flashinfer_deepgemm` to `--fp8-gemm-backend` (#18982 )	2026-02-18 14:45:47 -05:00
Mengyang Liu	4f980f6f23	[Feature] Implement update_weights_from_disk for SGLang-D (Diffusion … (#18306 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2026-02-18 11:24:07 -08:00
HAI	934b36693c	Reasoning models fix docs (#18963 )	2026-02-17 23:05:55 -08:00
Makcum888e	14c95d255c	[Diffusion] [NPU] [Doc] Add NPU documentation for sglang-diffusion (#18894 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-17 10:12:20 +03:00
Estrella-xx	1b3513a7e4	refactor FAKE transfer backend and remove --disaggregation-decode-enable-fake-auto parameter (#18345 )	2026-02-16 17:27:02 +03:00
RAY	d85884ca57	Update ascend_npu_qwen3_5_examples.md (#18888 )	2026-02-16 10:24:27 +03:00
Douglas Yang	f1efb46bdd	fix: adding performance logging for nightly diffusion (#18023 )	2026-02-16 14:09:00 +08:00
Duyi-Wang	5ddc84e33e	[AMD] MORI-EP inter kernel type switch (#18437 ) Co-authored-by: HAI <hixiao@gmail.com>	2026-02-15 20:59:39 -08:00
Rain Jiang	0ffd0a3995	Nsa trtllm mla sparse fp8 support with Deepseek v3.2 NVFP4 (#18389 )	2026-02-16 09:29:54 +08:00
chenxu214	fd5a45d5cf	Update ascend_npu_support.rst (#18868 )	2026-02-16 01:41:38 +08:00
chenxu214	f2d72866e9	Create ascend_npu_qwen3_5_examples.md (#18864 )	2026-02-16 01:15:20 +08:00
SoluMilken	07a24f1a38	update pre-commit config (#18860 )	2026-02-16 00:18:31 +08:00
Bhavneek Singh	1ce3420784	Model: Support IBM Granite (Dense/Mamba + MoE) (#18040 )	2026-02-15 11:24:41 +08:00
shuwenn	4cf4f0859f	[Doc] Convert the speculative decoding notebook to markdow (#18395 )	2026-02-14 18:18:56 -08:00
Kangyan-Zhou	3a1c388b43	Update performance dashboard for nightly tests (#18824 )	2026-02-14 09:28:28 +08:00
shuwenn	3299c4f9c1	[CI] feat: add early exit to wait_for_server when process dies (#18602 )	2026-02-13 16:46:09 -08:00
dongjiyingdjy	8b4c364960	refactor context parallel state (#17213 ) Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2026-02-13 23:18:17 +08:00
Xinwei Qiang	356e338607	[diffusion] feat: support SparseVideoGen2 attention backend (#17507 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-02-13 16:20:46 +08:00
Liangsheng Yin	e6f7a372ef	Rename request timeout env vars for waiting/running stages (#18766 )	2026-02-12 22:58:40 -08:00
HuangJi	f4d80f9d42	[diffusion] feat: allows quality adjustment of generated images/videos (#17937 )	2026-02-13 11:56:20 +08:00
BourneSun0527	f65c885e7c	Modify glm5 readme on npu (#18768 )	2026-02-13 11:42:40 +08:00
shuwenn	bc2405e6c1	feat: support release lookup (#18450 )	2026-02-13 10:47:02 +08:00
danielafrimi	e422bcaed8	[Mamba] Add float16 support for SSM cache dtype (#18444 )	2026-02-12 11:27:47 +08:00
fy	123f57b84b	update glm5 readme on npu (#18657 )	2026-02-12 10:37:12 +08:00
liupeng374	c34832c02c	glm5 md (#18655 )	2026-02-12 10:11:59 +08:00
qianyue76	f06ab17a73	[diffusion] docs: consolidate diffusion documentation into docs (#18095 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: JiaxinD <djx2048@gmail.com>	2026-02-11 16:55:07 -08:00
Baizhou Zhang	947927bdb5	[V3.2] Change default CP token split method to `--round-robin-split` (#18613 )	2026-02-11 20:14:35 +08:00
赵晨阳	a2c38f7796	Enhance SMG guide with RL rollout systems benefits (#18588 )	2026-02-10 20:20:45 -08:00
AlexZhao	3167bcc01c	[Doc] Comprehensive Guide: Navigating DP, DPA, and SMG Best Practices (#18096 ) Co-authored-by: 赵海源 <zhaohaiyuan@xiaohongshu.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2026-02-10 18:31:28 -08:00
husf	573ff55814	[NPU][docs]fix bug about hyperlink for best practice for ascend npu (#18561 )	2026-02-10 20:03:28 +03:00
Hexq0210	d0d387dea1	[NPU] update npu doc (#18474 )	2026-02-10 16:59:13 +03:00
husf	99101ce30b	[NPU][docs] improve docs for Best Practice on Ascend NPU (#18360 )	2026-02-10 16:52:27 +03:00
Zack Yu	54589a2f2d	docs: expand and update modelopt documentation (#18479 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-09 23:09:52 +00:00
brimon	ddbcfbaaab	feature: support bidirectional attention for Gemma-3 (#10707 )	2026-02-09 23:17:45 +08:00
Junlin Zhou	14652243bd	[DLLM] Add JointThreshold algorithm for joint M2T and T2T decoding (#18171 ) Signed-off-by: Junlin Zhou <zhoujunlin.zjl@antgroup.com> Co-authored-by: Tiwei Bie <tiwei.btw@antgroup.com>	2026-02-09 14:20:45 +08:00
Mohammad Miadh Angkad	fddef76619	[Doc] Fix outdated `--fp4-gemm-backend` documentation (#18350 )	2026-02-07 20:42:47 +08:00
Mohammad Miadh Angkad	c47c2f9466	[Doc] Update CUDA 13 install guide to install torch first (#18404 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-02-07 18:04:37 +08:00
Hexq0210	e834b85ab6	[NPU] update npu doc (#18344 )	2026-02-07 16:38:05 +08:00
Rishit Shivam	c850a8a41a	[Docs] Add Falcon H1, Hunyuan-Large, Qwen3-Omni support and update Diffusion usage (#17888 ) Co-authored-by: Rishitshivam <164783543+Rishitshivam@users.noreply.github.com> Co-authored-by: Ratish P <114130421+Ratish1@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2026-02-06 13:17:51 -08:00
amote-i	92b8bd6833	fix npu best practice (#18330 )	2026-02-05 21:14:46 -05:00
shuwenn	ef1d0ea885	[Doc] add a summary section for spec decode document (#18323 )	2026-02-05 16:34:31 -05:00
shuwenn	8b21dd4b77	[Doc] refine spec decode docs for SpecV2/STANDALONE/NGRAM (#18321 )	2026-02-05 15:12:33 -05:00
Kun Lin	e616d35847	Support Markdown/Notebook-Friendly Documentation Export for Downstream Integration(convert rat files to md files and save) (#18278 )	2026-02-04 19:59:40 -08:00

1 2 3 4 5 ...

1089 Commits