sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-06-30 19:57:52 +00:00

Author	SHA1	Message	Date
Duyi-Wang	8c190f6b91	[AMD] Add SGLANG_MORI_MOE_MAX_INPUT_TOKENS to truncate dispatch before MoE. (#22952 )	2026-04-16 23:40:15 -07:00
Jan Bernlöhr	04a53955b9	feat: add coordinated checkpoint prefetch for network filesystem loading (#20843 )	2026-04-16 20:08:19 -07:00
Baizhou Zhang	d14d368191	[Kernel] Set sgl_per_token_group_quant_8bit_v2 as default choice (#22467 )	2026-04-11 01:59:57 -07:00
billishyahao	1df9f4e2f6	[AMD] Add prealloc token env for mori-ep (#22329 )	2026-04-09 09:34:35 -07:00
Nicolas Castet	e379befbac	Add symmetric debug mode to print stack trace of comm ops with unregistered tensors (#18569 )	2026-04-08 22:34:58 -07:00
Rain Jiang	1a8eb890f6	Kernels community fa3 (#20796 )	2026-04-07 12:48:44 -07:00
YAMY	dc125afffb	Add staging buffer CI test and documentation for heterogeneous TP (#21921 ) Co-authored-by: Shangming Cai <csmthu@gmail.com>	2026-04-06 14:00:20 +08:00
Brayden Zhong	6aafe756b9	Revert "[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+… (#22047 )	2026-04-03 13:12:30 -07:00
Mook	991f3aa5b3	[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+) (#19652 )	2026-04-03 10:48:15 +08:00
Aishwarya Ramasethu	c32ee48886	MFU metrics in Prometheus (#19395 )	2026-03-29 23:40:06 -07:00
Shu Wang	efebcab43e	Support skip-softmax attention (#19089 )	2026-03-28 15:55:48 -07:00
Baizhou Zhang	edd4d54023	[Clean] Remove deprecated environs (#21536 )	2026-03-28 00:35:44 -07:00
Duyi-Wang	61a902ce88	[AMD][MoRI] Auto-select dispatch quantization type from MoE weight dtype. (#21040 )	2026-03-24 22:53:57 -07:00
Jiaxin(Jackson) Deng	c4db64c16b	Add Lychee Doc Links Check to Local and CI (#19742 ) Co-authored-by: Zijie Xia <zijie_xia@icloud.com> Co-authored-by: Zijie Xia <zijiexia@users.noreply.github.com> Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>	2026-03-24 13:48:26 -07:00
Xiaoyu Zhang	766d225fcc	Add SGLang CUDA crash API logging inspired by FlashInfer (#20910 )	2026-03-22 16:39:40 +08:00
Qiaolin Yu	c5d2528bff	Revert "[AMD][MORI] Fix MTP crash with FP4/FP8 dispatch and add NEXTN dispatch env vars." (#20797 )	2026-03-17 17:28:09 -07:00
Duyi-Wang	385a35bd11	[AMD][MORI] Fix MTP crash with FP4/FP8 dispatch and add NEXTN dispatch env vars. (#20647 )	2026-03-17 01:13:42 -07:00
AnonTokyo	e9fae69e5f	docs: align environment variable reference with environ defaults (#20419 )	2026-03-15 18:07:29 -07:00
Baizhou Zhang	39008955ff	Revert "[AMD][MORI] Fix MTP crash with FP4/FP8 dispatch and add NEXTN dispatch env vars." (#20602 )	2026-03-14 12:12:42 -07:00
Duyi-Wang	0eea80bc00	[AMD][MORI] Fix MTP crash with FP4/FP8 dispatch and add NEXTN dispatch env vars. (#20453 )	2026-03-13 14:03:17 -07:00
StonyPort	d4e68ead1d	[quant] Ignore FP8 quantization layers (#20340 ) Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-13 13:59:39 +08:00
Baizhou Zhang	be63f982b7	[V32/GLM5] Control the threshold of applying dense attention with an environ (#20062 )	2026-03-09 14:36:10 -07:00
AMD-yanfeiwang	f0153ad225	[AMD][Feature] support fp4 dispatch and fp8 combine in moriep (#19757 ) Co-authored-by: Duyi-Wang <duyi.wang@amd.com>	2026-03-09 12:52:05 -07:00
yuyu5333	230fb55899	[Performance] Decode Offload improves the long texts performance 100% through dynamic block offload. (#17216 ) Co-authored-by: zhangheng <hzh0425@apache.org>	2026-03-08 17:16:53 +08:00
shuwenn	7bd3dd9270	fix: image URL in notebook to use raw.githubusercontent.com (#20100 )	2026-03-07 13:28:20 -08:00
StonyPort	806d41ab65	[quant] fix fp32 downcasting (#19844 ) Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>	2026-03-05 17:54:59 +08:00
Brayden Zhong	e2af840c3d	Various SM120 improvements (#19721 )	2026-03-03 16:46:13 -08:00
Feng Su	3b89302277	Refactor: observability code cleanup (#17862 ) Signed-off-by: Feng Su <sufeng@linux.alibaba.com>	2026-02-24 18:07:29 -08:00
shuwenn	0c224c3c62	docs: Embed release lookup tool into Sphinx documentation site (#19264 )	2026-02-24 11:11:27 -08:00
Duyi-Wang	5ddc84e33e	[AMD] MORI-EP inter kernel type switch (#18437 ) Co-authored-by: HAI <hixiao@gmail.com>	2026-02-15 20:59:39 -08:00
shuwenn	3299c4f9c1	[CI] feat: add early exit to wait_for_server when process dies (#18602 )	2026-02-13 16:46:09 -08:00
Liangsheng Yin	e6f7a372ef	Rename request timeout env vars for waiting/running stages (#18766 )	2026-02-12 22:58:40 -08:00
qianyue76	f06ab17a73	[diffusion] docs: consolidate diffusion documentation into docs (#18095 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: JiaxinD <djx2048@gmail.com>	2026-02-11 16:55:07 -08:00
Rishit Shivam	c850a8a41a	[Docs] Add Falcon H1, Hunyuan-Large, Qwen3-Omni support and update Diffusion usage (#17888 ) Co-authored-by: Rishitshivam <164783543+Rishitshivam@users.noreply.github.com> Co-authored-by: Ratish P <114130421+Ratish1@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2026-02-06 13:17:51 -08:00
rinbaro	de6a03260f	[docs] fix misspellings & typos (#18276 )	2026-02-05 03:35:29 +00:00
Teng Ma	c8212b9fac	[PD] doc: Document SGLANG_MOONCAKE_CUSTOM_MEM_POOL and supported values (#18259 ) Co-authored-by: Shangming Cai <csmthu@gmail.com>	2026-02-05 03:03:33 +00:00
Baizhou Zhang	d279520ba5	[DeepGemm] Add a flag for fast warmup (#18111 )	2026-02-04 14:12:13 +08:00
Yuan Luo	afebb7ab78	Optimize custom-all-reduce (#17674 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-02-01 18:59:31 +08:00
StonyPort	2b3408ff14	feat: add forward timeout (#17831 ) Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>	2026-01-30 08:52:29 +08:00
Joe Redmond	0ff0d181ca	feat: add custom request header logging (#17786 )	2026-01-28 19:33:08 -08:00
kk	f1384f5293	Integration mori backend for EP a2a data communication (#17012 ) Co-authored-by: Duyi-Wang <duyi.wang@amd.com> Co-authored-by: billishyahao <bill.he@amd.com> Co-authored-by: HaiShaw <hixiao@gmail.com>	2026-01-28 19:07:34 -08:00
Trevor Morris	2c2c4e446b	[NVIDIA] Add flashinfer all-to-all MOE dispatcher (#14668 )	2026-01-24 22:59:55 +08:00
Nicolas Castet	48e9daadff	Support symmetric memory pre-allocation to avoid fragmentation (#17089 )	2026-01-23 17:57:04 +08:00
Baizhou Zhang	6ea491e439	Overlap shared experts with deepep dispatch for single batch overlap on Blackwell (#17289 )	2026-01-21 02:56:55 +08:00
Baizhou Zhang	55c616427d	Add flag that enables NCCL mlp sync batch for overlap scheduler (#17288 )	2026-01-20 23:06:55 +08:00
b8zhong	f374623fa9	[Refactor] Set `fp4-gemm-backend=auto` on SM100 and rename `fp4-gemm-backend` with `flashinfer_` prefix (#17309 )	2026-01-19 20:09:07 +08:00
StonyPort	3355b6e21b	feat: add request queued timeout (#17143 ) Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2026-01-16 17:55:09 +08:00
siyu	068abe7e40	add doc for #14386 (#14655 )	2026-01-09 22:38:51 +08:00
Baizhou Zhang	7d757d6f17	Clean Some Environment Variables for DeepSeek V32 (#15938 )	2026-01-07 14:00:16 +08:00
Huaixin Chang	c1dfbc777b	deprecate prefill-round-robin-balance (#16195 ) Signed-off-by: Chang Huaixin (OpenAnolis) <changhuaixin@linux.alibaba.com> Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>	2025-12-31 22:25:33 +08:00

1 2 3 4 5 ...

301 Commits