Commit Graph

536 Commits

Author SHA1 Message Date
R0CKSTAR
b07c5e4080 Pin uvloop to 0.21.0 (#12279)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
2025-11-07 03:33:31 +08:00
sglang-bot
0c006b8809 chore: bump SGLang version to 0.5.5 (#12739) 2025-11-07 00:46:19 +08:00
gongwei-130
97be66c358 fix sgl-kernel version (#12723) 2025-11-05 19:01:03 -08:00
Mick
7bc1dae095 WIP: initial multimodal-gen support (#12484)
Co-authored-by: yhyang201 <yhyang201@gmail.com>
Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
Co-authored-by: JiLi <leege233@gmail.com>
Co-authored-by: CHEN Xi <78632976+RubiaCx@users.noreply.github.com>
Co-authored-by: laixin <xielx@shanghaitech.edu.cn>
Co-authored-by: SolitaryThinker <wlsaidhi@gmail.com>
Co-authored-by: jzhang38 <a1286225768@gmail.com>
Co-authored-by: BrianChen1129 <yongqichcd@gmail.com>
Co-authored-by: Kevin Lin <42618777+kevin314@users.noreply.github.com>
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: rlsu9 <r3su@ucsd.edu>
Co-authored-by: Jinzhe Pan <48981407+eigensystem@users.noreply.github.com>
Co-authored-by: foreverpiano <pianoqwz@qq.com>
Co-authored-by: RandNMR73 <notomatthew31@gmail.com>
Co-authored-by: PorridgeSwim <yz3883@columbia.edu>
Co-authored-by: Jiali Chen <90408393+gary-chenjl@users.noreply.github.com>
2025-11-05 12:28:52 -08:00
sglang-bot
09938e1f82 chore: bump SGLang version to 0.5.4.post3 (#12639) 2025-11-04 18:32:11 -08:00
Baizhou Zhang
6e29446e45 [hotfix] Remove flashinfer-jit-cache from pyproject (#12530) 2025-11-02 22:11:05 -08:00
Yineng Zhang
0c3543d7d5 chore: upgrade flashinfer 0.5.0 (#12523)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-11-02 20:54:12 -08:00
sglang-bot
41c10e67fc chore: bump SGLang version to 0.5.4.post2 (#12439) 2025-10-31 17:38:50 -07:00
Baizhou Zhang
587deb15a7 [hotfix] Fix pytest not found in CI (#12311) 2025-10-29 11:07:36 +08:00
ishandhanani
285a8e6986 docker: add CUDA13 support in dockerfile and update GDRCopy/NVSHMEM for blackwell support (#11517)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-10-27 22:00:54 -07:00
Xinyuan Tong
729f612dc6 Update openai package version to 2.6.1 (#12222) 2025-10-28 11:23:40 +08:00
sglang-bot
55d75e11bd chore: bump SGLang version to 0.5.4.post1 (#12169) 2025-10-27 09:35:20 +08:00
Liangsheng Yin
8491c794ad [misc] depdencies & enviroment flag (#12113) 2025-10-26 14:52:35 +08:00
Baizhou Zhang
4b0ac1d52a Update sgl-kernel version to 0.3.16.post4 (#12125) 2025-10-25 14:33:33 -07:00
Muqi Li
b04cd3d487 Add 'gguf' to project dependencies (#12046) 2025-10-24 17:16:19 +08:00
sglang-bot
1053e1be17 chore: bump SGLang version to 0.5.4 (#12027)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2025-10-23 18:01:40 -07:00
Teng Ma
96a5e4dd79 [Feature] Support loading weights from ckpt engine worker (#11755)
Signed-off-by: Yang Kaiyong <yangkaiyong.yky@antgroup.com>
Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
Co-authored-by: Yang Kaiyong <yangkaiyong.yky@antgroup.com>
Co-authored-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
Co-authored-by: Xuchun Shang <xuchun.shang@gmail.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2025-10-23 09:23:30 -07:00
Chang Su
6ade6a02d4 [grpc] Support gRPC standard health check (#11955) 2025-10-22 16:59:09 -07:00
Zhiyu
80b2b3207a Enable native ModelOpt quantization support (3/3) (#10154)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
2025-10-21 21:44:29 -07:00
Yineng Zhang
9792b9d7e3 chore: upgrade flashinfer 0.4.1 (#11933) 2025-10-21 14:46:31 -07:00
Baizhou Zhang
ebff4ee648 Update sgl-kernel and remove fast hadamard depedency (#11844) 2025-10-21 13:13:54 -07:00
fzyzcjy
a7043c6f0d Bump torch_memory_saver to avoid installing pre-release versions (#11797) 2025-10-18 01:20:42 -07:00
Lianmin Zheng
67e34c56d7 Fix install instructions and pyproject.tomls (#11781) 2025-10-18 01:08:01 -07:00
sglang-bot
85ebeecf06 chore: bump SGLang version to 0.5.3.post3 (#11693)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2025-10-16 13:14:55 -07:00
sglang-bot
baf277a9bf chore: bump SGLang version to 0.5.3.post2 (#11680)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2025-10-15 16:49:14 -07:00
Sahithi Chigurupati
e9e120ac7a fix: upgrade transformers to 4.57.1 (#11628)
Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>
Co-authored-by: zhyncs <me@zhyncs.com>
2025-10-14 18:35:05 -07:00
Johnny
cb8f3d90d3 [NVIDIA] update pyproject.toml to support cu130 option (#11521) 2025-10-13 13:03:31 -07:00
ai-jz
9cc1e065f1 [router][Fix] Include grpc reflection runtime dependency (#11419)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
2025-10-13 09:32:42 -07:00
Lianmin Zheng
548a57b1f3 Fix port conflicts in CI (#11497) 2025-10-12 06:46:36 -07:00
sglang-bot
758b887ad1 chore: bump SGLang version to 0.5.3.post1 (#11324) 2025-10-09 15:19:59 -07:00
Yineng Zhang
44cb060785 chore: upgrade flashinfer 0.4.0 (#11364) 2025-10-09 14:17:54 -07:00
Lifu Huang
edefab0c64 [2/2] Support MHA prefill with FlashAttention 4. (#10937)
Co-authored-by: Hieu Pham <hyhieu@gmail.com>
2025-10-08 00:54:20 -07:00
DarkSharpness
832c84fba9 [Chore] Update xgrammar 0.1.24 -> 0.1.25 (#10710) 2025-10-07 18:22:28 -07:00
sglang-bot
a4a3d82393 chore: bump SGLang version to 0.5.3 (#11263) 2025-10-06 20:07:02 +08:00
sglang-bot
0b13cbb7c9 chore: bump SGLang version to 0.5.3rc2 (#11259)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2025-10-06 01:12:10 -07:00
Lianmin Zheng
f8924ad74b update sgl kernel version to 0.3.14.post1 (#11242) 2025-10-05 20:30:40 -07:00
fzyzcjy
2f80bd9f0e Bump torch_memory_saver 0.0.9rc2 (#11252) 2025-10-05 20:26:20 -07:00
Lianmin Zheng
d645ae90a3 Rename runner labels (#11228) 2025-10-05 18:05:41 -07:00
Xinyuan Tong
652c24a653 Update transformers package version to 4.57.0 (#11222)
Co-authored-by: yhyang201 <yhyang201@gmail.com>
2025-10-05 23:45:14 +00:00
Simo Lin
d736e0b65e [router] add grpc router pd mode for chat and generate (#11140) 2025-10-04 06:58:28 -07:00
Matt Nappo
8c57490210 [Feature] Option to save model weights to CPU when memory saver mode is enabled (#10873)
Co-authored-by: molocule <34072934+molocule@users.noreply.github.com>
2025-10-03 16:48:19 +08:00
Liangsheng Yin
bfcd9b2433 [grpc] style fix for grpc compilation. (#11175) 2025-10-03 01:44:29 +08:00
Lianmin Zheng
2d62af6be5 Fix metrics and request tracing (TimeStats) (#11123) 2025-10-01 13:03:07 -07:00
eigen
ac1f2928ae feat: add fast_decode_plan from flashinfer, flashinfer to 0.4.0rc3 (#10760)
Co-authored-by: Zihao Ye <yezihhhao@gmail.com>
Co-authored-by: Sleepcoo <Sleepcoo@gmail.com>
2025-10-01 02:56:13 -07:00
Yineng Zhang
3a641d9085 chore: upgrade sgl-kernel 0.3.13 (#11056) 2025-09-29 02:22:25 -07:00
Yineng Zhang
5942fdb480 chore: upgrade cutedsl 4.2.1 (#11054) 2025-09-29 00:24:17 -07:00
Zhihao Zhang
24f7cb1ece [speculative decoding] rename lookahead to ngram (#11010)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
2025-09-28 21:06:59 -07:00
Yineng Zhang
8c1ef0f914 chore: upgrade sgl-kernel 0.3.12 (#10782) 2025-09-23 00:18:54 -07:00
Yineng Zhang
6f993e8b9e chore: cleanup docker image (#10671) 2025-09-19 16:56:49 -07:00
Baizhou Zhang
3fa3c22ae2 Fix fast decode plan for flashinfer v0.4.0rc1 and upgrade sgl-kernel 0.3.11 (#10634)
Co-authored-by: zhyncs <me@zhyncs.com>
2025-09-19 01:25:29 -07:00