Commit Graph

545 Commits

Author SHA1 Message Date
Liangsheng Yin
35870d55ac Deepseek V4 (#23882)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: yueming-yuan <yym022502@gmail.com>
Co-authored-by: DarkSharpness <2040703891@qq.com>
Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
Co-authored-by: yhyang201 <yhyang201@users.noreply.github.com>
Co-authored-by: yhyang201 <yhyang201@gmail.com>
Co-authored-by: Qiaolin Yu <90088090+qiaolin-yu@users.noreply.github.com>
Co-authored-by: Ethan (Yusheng) Su <11704492+yushengsu-thu@users.noreply.github.com>
Co-authored-by: Mingyi <27337995+wisclmy0611@users.noreply.github.com>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: Yihao Wang <42559837+againstentropy@users.noreply.github.com>
2026-05-07 18:32:21 -07:00
Baizhou Zhang
ecb786c8d7 [Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm (#24268) 2026-05-06 18:59:01 -07:00
Linzhang Li
952b3caf18 feat: use structural tags to enable strict tool calling and reasoning for more models (#21722)
Signed-off-by: Yuchuan <yuchuan.7streams@gmail.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Ubospica <ubospica@gmail.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2026-05-04 02:30:28 -07:00
Brayden Zhong
88bb5dffe4 [Dependency] Upgrade to Torch 2.11.0 (#21247)
Co-authored-by: Kangyan Zhou <zky314343421@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-05-02 12:25:36 -07:00
Kangyan-Zhou
cd27baaffd [ci][cu13] Bump torch_memory_saver to 0.0.9.post1; restore manual tests (#23182)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 22:50:38 -07:00
AlonKejzman
66ea0aee7f tokenizer: Add fastokens support (#23753) 2026-04-28 11:43:10 -07:00
Xinyuan Tong
e5198386bd Upgrade transformers from 5.5.4 to 5.6.0 (#23525) 2026-04-26 22:33:54 -07:00
sglang-bot
9003f24e2b chore: bump sglang-kernel version to 0.4.1.post1 (#23733)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
Co-authored-by: Kangyan Zhou <zky314343421@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 23:23:49 -07:00
sglang-bot
f3b88e080a chore: bump flashinfer version to 0.6.8.post1 (#23281)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2026-04-23 15:23:03 -07:00
Alex Nails
10e17cc55e [gRPC] Native gRPC server: proto + Rust crate scaffold + server args (#22736) 2026-04-20 12:39:35 +08:00
Baizhou Zhang
6ecd6f84db [CI] Add per-job uv venv isolation and upgrade CI version to Cuda 13 (#23119)
Co-authored-by: Kangyan Zhou <zky314343421@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Alison Shao <a.shao@wustl.edu>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-04-19 05:32:36 -07:00
Xinyuan Tong
34fef07a15 Upgrade transformers to 5.5.3 and refactor hf_transformers_utils into subpackage (#21569) 2026-04-15 20:03:44 -07:00
Baizhou Zhang
b441317aa4 Revert "Upgrade CI default CUDA version from 12.9 to 13.0" (#22727) 2026-04-13 14:39:24 -07:00
Asish Kumar
39810762d2 fix: use describe mode for SGLang version detection (#22600)
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
2026-04-13 09:45:45 -07:00
Alison Shao
3f4fbc165d Upgrade CI default CUDA version from 12.9 to 13.0 (#21441) 2026-04-12 21:48:40 -07:00
sglang-bot
df3275bd6c chore: bump flashinfer version to 0.6.7.post3 (#22382)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2026-04-08 14:49:45 -07:00
Rain Jiang
1a8eb890f6 Kernels community fa3 (#20796) 2026-04-07 12:48:44 -07:00
Ke Bao
be42fbbbd7 Support HTTP2 server (#21700) 2026-04-08 00:42:52 +08:00
Kangyan-Zhou
93109cc89b [Fix] Fix setuptools-scm version resolution for rc tags (#22165)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-04-05 16:55:32 -07:00
sglang-bot
46bf19cdab chore: bump flashinfer version to 0.6.7.post2 (#22097)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2026-04-04 02:16:25 -07:00
sglang-bot
84118acf50 chore: bump sglang-kernel version to 0.4.1 (#22009)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2026-04-03 13:58:35 -07:00
Noa Neria
8d9145d97e Direct model loading from object storage with Runai Model Streamer (#17948)
Signed-off-by: Noa Neria <noa@run.ai>
2026-04-01 18:41:22 -07:00
Alison Shao
1ac74e652e [Misc] Fix comparator e2e tests: add polars dep + fix dp-attention test (#21804)
Co-authored-by: Alison Shao <alison.shao@mac.lan>
2026-04-01 15:44:35 -07:00
sglang-bot
ca3ba05a7a chore: bump flashinfer version to 0.6.7 (#21422)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-03-31 21:18:16 -07:00
Anant Sharma
f289d173aa [Deps] Bump xgrammar to 0.1.32 (#21032) 2026-03-26 01:22:37 -07:00
Alison Shao
5297a3cb46 [CI] Rewrite killall_sglang as Python with CI/local dual mode (#21331)
Co-authored-by: Alison Shao <alison.shao@mac.lan>
Co-authored-by: Alison Shao <alison.shao@MacBook-Pro-D2W773R9CD.local>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2026-03-24 23:54:01 -07:00
Xinyuan Tong
d1e95af282 Upgrade transformers==5.3.0 (#17784)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Alison Shao <alisonshao@mac.lan>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-18 13:50:43 -07:00
Rain Jiang
cb1e63aba4 bump fa4 to official released fa4 pkg (#20303) 2026-03-17 17:22:56 -07:00
DefTruth
025691cd9e [diffusion] chore: bump up cache-dit & support quant for diffusers backend (#20361) 2026-03-17 12:51:31 +08:00
Xiaoyu Zhang
15097c5c3b Release sglang kernel 0.4.0 (#20440)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-03-16 20:34:58 +08:00
Ke Bao
e2be31824f [CI] Add ut coverage tool (#20628) 2026-03-15 21:13:45 +08:00
sglang-bot
93afe15b43 chore: bump flashinfer version to 0.6.6 (#20480)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2026-03-14 13:05:10 -07:00
Simo Lin
654fc02cf1 [gRPC] Extract gRPC servicer into standalone package (#20478)
Signed-off-by: Simo Lin <linsimo.mark@gmail.com>
2026-03-13 09:13:29 -07:00
Yuhao Yang
a57a44739f [diffusion] deps: upgrade diffusers from 0.36.0 to 0.37.0 (#20318) 2026-03-12 19:17:28 +08:00
Rain Jiang
61b228239e bump sgl-fa4 version to 4.0.5 to loose torch deps (#20378) 2026-03-11 13:08:09 -07:00
Xiaoyu Zhang
680d9d98e4 Fix cutedsl ci error (#20309)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-03-11 16:17:35 +08:00
Xinyuan Tong
4a757990a1 [VLM] Replace decord with torchcodec for video decoding (#20055)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
2026-03-09 19:23:49 +08:00
Xinyu Zhang
b3cfad0a80 Add Ray actor support for scheduler process management (DP=1) (#17684)
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-03-05 13:21:23 -08:00
Rain Jiang
472eef4071 fa4 cleanup (#19727) 2026-03-05 17:54:25 +08:00
Kangyan-Zhou
198381d9ce Add SSL/TLS support for HTTP and gRPC servers (#18973)
Co-authored-by: guys@spotify.com
2026-03-04 19:27:16 -08:00
Jasonzhang517
d939e26585 [model gateway][0/N] router EPD support: add encoder grpc server backend support (#16552)
Co-authored-by: Zongyao Chen <ZongYao.Chen@linux.alibaba.com>
Co-authored-by: Zongyao Chen <solar1s@163.com>
2026-03-03 19:38:15 +08:00
Mohammad Miadh Angkad
6822941514 [FlashInfer] Bump FlashInfer version from 0.6.3 to 0.6.4 (#19005) 2026-03-02 16:12:09 -08:00
Prozac614
57c5c343d7 [diffusion] model: support Hunyuan3D-2 (#18170)
Co-authored-by: yingluosanqian <yingluosanqian@gmail.com>
Co-authored-by: daiweitao <dwti614707404@163.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-02 12:28:05 +08:00
DefTruth
78d6674c45 [diffusion] feat: support hybrid parallelism for diffusers backend (#19405) 2026-02-27 00:06:08 +08:00
Mick
241ee90164 [diffusion] chore: tiny fix pyproject.toml (#19256) 2026-02-25 11:57:53 +08:00
GMI Xiao Jin
fcfd964d7d [diffusion] model: LTX-2 Support PR3 (#19151) 2026-02-24 16:55:28 +08:00
Mohammad Miadh Angkad
1be41e9036 [FlashInfer] Bump FlashInfer version from 0.6.2 to 0.6.3 (#18448) 2026-02-14 07:43:33 +08:00
Simo Lin
92c5749f41 refactor: replace local proto compilation with smg-grpc-proto package (#18682) 2026-02-12 05:29:24 -08:00
shaharmor98
c6aa1863be Add Nemotron 3 Nano tests (#18119)
Signed-off-by: Shahar Mor <smor@nvidia.com>
2026-02-06 23:55:42 +08:00
linhaifeng
c1d5cc3b24 [Bugfix] fix a obvious logic error (#18254) 2026-02-04 13:59:58 -08:00