sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-03 13:57:04 +00:00

Author	SHA1	Message	Date
billishyahao	fbb6098487	[AMD] support two batch overlapping for mori ep (#17953 ) Co-authored-by: kkHuang-amd <wunhuang@amd.com> Co-authored-by: Feiyue Zhai <feiyue.zhai@amd.com> Co-authored-by: Duyi-Wang <duyi.wang@amd.com> Co-authored-by: HAI <hixiao@gmail.com>	2026-02-20 08:45:55 -08:00
Mohammad Miadh Angkad	2f592c3b18	[Doc] Add `flashinfer_deepgemm` to `--fp8-gemm-backend` (#18982 )	2026-02-18 14:45:47 -05:00
Mengyang Liu	4f980f6f23	[Feature] Implement update_weights_from_disk for SGLang-D (Diffusion … (#18306 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2026-02-18 11:24:07 -08:00
Estrella-xx	1b3513a7e4	refactor FAKE transfer backend and remove --disaggregation-decode-enable-fake-auto parameter (#18345 )	2026-02-16 17:27:02 +03:00
Rain Jiang	0ffd0a3995	Nsa trtllm mla sparse fp8 support with Deepseek v3.2 NVFP4 (#18389 )	2026-02-16 09:29:54 +08:00
SoluMilken	07a24f1a38	update pre-commit config (#18860 )	2026-02-16 00:18:31 +08:00
shuwenn	4cf4f0859f	[Doc] Convert the speculative decoding notebook to markdow (#18395 )	2026-02-14 18:18:56 -08:00
shuwenn	3299c4f9c1	[CI] feat: add early exit to wait_for_server when process dies (#18602 )	2026-02-13 16:46:09 -08:00
dongjiyingdjy	8b4c364960	refactor context parallel state (#17213 ) Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2026-02-13 23:18:17 +08:00
danielafrimi	e422bcaed8	[Mamba] Add float16 support for SSM cache dtype (#18444 )	2026-02-12 11:27:47 +08:00
qianyue76	f06ab17a73	[diffusion] docs: consolidate diffusion documentation into docs (#18095 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: JiaxinD <djx2048@gmail.com>	2026-02-11 16:55:07 -08:00
Baizhou Zhang	947927bdb5	[V3.2] Change default CP token split method to `--round-robin-split` (#18613 )	2026-02-11 20:14:35 +08:00
赵晨阳	a2c38f7796	Enhance SMG guide with RL rollout systems benefits (#18588 )	2026-02-10 20:20:45 -08:00
AlexZhao	3167bcc01c	[Doc] Comprehensive Guide: Navigating DP, DPA, and SMG Best Practices (#18096 ) Co-authored-by: 赵海源 <zhaohaiyuan@xiaohongshu.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2026-02-10 18:31:28 -08:00
Zack Yu	54589a2f2d	docs: expand and update modelopt documentation (#18479 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-09 23:09:52 +00:00
Mohammad Miadh Angkad	fddef76619	[Doc] Fix outdated `--fp4-gemm-backend` documentation (#18350 )	2026-02-07 20:42:47 +08:00
shuwenn	ef1d0ea885	[Doc] add a summary section for spec decode document (#18323 )	2026-02-05 16:34:31 -05:00
shuwenn	8b21dd4b77	[Doc] refine spec decode docs for SpecV2/STANDALONE/NGRAM (#18321 )	2026-02-05 15:12:33 -05:00
rinbaro	de6a03260f	[docs] fix misspellings & typos (#18276 )	2026-02-05 03:35:29 +00:00
Teng Ma	c8212b9fac	[PD] doc: Document SGLANG_MOONCAKE_CUSTOM_MEM_POOL and supported values (#18259 ) Co-authored-by: Shangming Cai <csmthu@gmail.com>	2026-02-05 03:03:33 +00:00
Viacheslav	74f716dbd7	Gigachat 3 tool parser and tests (#14765 )	2026-02-02 22:28:34 -08:00
husf	c52578c7fd	【docs】【NPU】Update Expert Parallelism docs for Ascend NPU (#17940 )	2026-01-30 23:42:11 -05:00
Ziang Li	3c9cc44ff5	Add mxfp8 support for online quantization, Triton dense linear, and CUTLASS MoE (#17449 )	2026-01-29 21:33:57 +08:00
kk	f1384f5293	Integration mori backend for EP a2a data communication (#17012 ) Co-authored-by: Duyi-Wang <duyi.wang@amd.com> Co-authored-by: billishyahao <bill.he@amd.com> Co-authored-by: HaiShaw <hixiao@gmail.com>	2026-01-28 19:07:34 -08:00
Baizhou Zhang	832c756549	[Doc] Tiny update description on torch compile (#17819 )	2026-01-27 18:59:04 +08:00
shuwenn	fd3b179ffd	[HiCache][HA 1/N] Support HiCache storage runtime attach/detach (#15892 )	2026-01-26 19:33:19 -08:00
zijiexia	dd97e1fe38	[Docs] Add RL documentation (#17663 ) Co-authored-by: JD <jaedon.guo@gmail.com>	2026-01-26 12:16:54 -08:00
zackyoray	d275d47973	[NIXL] Add custom NIXL backend selection for KVManager (#17146 ) Signed-off-by: Yoray Zack <yorayz@nvidia.com>	2026-01-26 14:35:38 +08:00
Trevor Morris	2c2c4e446b	[NVIDIA] Add flashinfer all-to-all MOE dispatcher (#14668 )	2026-01-24 22:59:55 +08:00
Glen Liu	a6280b2a23	add documentation example for LoRA overlap loading and cleanup unused function (#17464 )	2026-01-24 15:33:16 +08:00
Yi Zhong	08fcda2f63	add the fa4 mm backend and varlen func (#13539 ) Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com> Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2026-01-23 23:12:06 +08:00
hxie	13f88045b3	configuration file support and nixl integration augmentation for hicache-storage-backend-extra-config (#16602 )	2026-01-22 14:31:48 -08:00
zijiexia	4ecd9afde9	[Docs] Rename SGLang Router to SGLang Model Gateway (#17436 )	2026-01-20 12:31:10 -08:00
Ruihuan He	eb38d64413	[Docs] Explain CUDA attention backend choices and aiter FP8 KV cache support (#17428 )	2026-01-20 11:10:36 -08:00
Shangming Cai	23d765d1f2	[Doc] Update pipeline parallelism documentation (#17414 )	2026-01-20 20:10:56 +08:00
b8zhong	f374623fa9	[Refactor] Set `fp4-gemm-backend=auto` on SM100 and rename `fp4-gemm-backend` with `flashinfer_` prefix (#17309 )	2026-01-19 20:09:07 +08:00
Glen Liu	ad1b4e4728	[Feature] overlap LoRA weight loading with compute (#15512 )	2026-01-19 10:43:17 +08:00
Xinyuan Tong	2069050d3f	fix: Handle multiple named chat templates in HuggingFace tokenizers (#17236 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2026-01-18 17:20:04 +08:00
zijiexia	166396ca4c	[Docs] minor update on ep docs (#17242 )	2026-01-16 18:20:47 -08:00
shuwenn	8ec160ed46	feature: support uvicorn access log filter(disable logging /metrics) (#15513 )	2026-01-15 20:00:06 -08:00
Yi Zhong	d1110e1c3e	docs only add kimi k2 thinking and kimi linear (#15789 )	2026-01-15 12:09:52 -05:00
shuwenn	9227d9f60c	[Docs] sort and update `server_arguments.md` (#17163 )	2026-01-15 12:07:18 -05:00
Glen Liu	6b065298b5	[Docs] add routing-key to schedule-policy in docs (#17101 )	2026-01-14 22:22:07 -05:00
fxmarty-amd	5af84c8af5	[AMD][Quantization] Add `int4fp8_moe` online quantization on ROCm (#7392 ) Co-authored-by: Dehua Tang <dehtang@amd.com> Co-authored-by: HAI <hixiao@gmail.com> Co-authored-by: YC Tseng <yctseng@amd.com>	2026-01-14 01:44:40 -08:00
shuwenn	cd33694585	feat: add --admin-api-key for finer-grained endpoint auth (#15908 ) Co-authored-by: Simo Lin <linsimo.mark@gmail.com>	2026-01-13 20:21:55 -08:00
James	ae0baefb94	[NPU] upgrade npu mf_apater plugin (#15853 )	2026-01-13 09:02:10 +08:00
Wenyi Xu	3c16c58619	[model-gateway] Add Redis support as a history backend (#16300 )	2026-01-11 01:03:00 -08:00
Ratish P	c0248d6f37	[dpc]: unify DP controller load balancing and simplify dispatch logic (#16258 )	2026-01-11 12:38:03 +08:00
Shangming Cai	973116e6bb	[Doc] Optimize pipeline parallelism doc (#16630 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2026-01-07 14:52:42 +08:00
Huapeng Zhou	078270473a	[Doc] Default lora backend: csgmv (#16444 )	2026-01-05 12:45:49 +08:00

1 2 3 4

187 Commits