Commit Graph

533 Commits

Author SHA1 Message Date
Baizhou Zhang
b441317aa4 Revert "Upgrade CI default CUDA version from 12.9 to 13.0" (#22727) 2026-04-13 14:39:24 -07:00
Asish Kumar
39810762d2 fix: use describe mode for SGLang version detection (#22600)
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
2026-04-13 09:45:45 -07:00
Alison Shao
3f4fbc165d Upgrade CI default CUDA version from 12.9 to 13.0 (#21441) 2026-04-12 21:48:40 -07:00
sglang-bot
df3275bd6c chore: bump flashinfer version to 0.6.7.post3 (#22382)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2026-04-08 14:49:45 -07:00
Rain Jiang
1a8eb890f6 Kernels community fa3 (#20796) 2026-04-07 12:48:44 -07:00
Ke Bao
be42fbbbd7 Support HTTP2 server (#21700) 2026-04-08 00:42:52 +08:00
Kangyan-Zhou
93109cc89b [Fix] Fix setuptools-scm version resolution for rc tags (#22165)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-04-05 16:55:32 -07:00
sglang-bot
46bf19cdab chore: bump flashinfer version to 0.6.7.post2 (#22097)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2026-04-04 02:16:25 -07:00
sglang-bot
84118acf50 chore: bump sglang-kernel version to 0.4.1 (#22009)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2026-04-03 13:58:35 -07:00
Noa Neria
8d9145d97e Direct model loading from object storage with Runai Model Streamer (#17948)
Signed-off-by: Noa Neria <noa@run.ai>
2026-04-01 18:41:22 -07:00
Alison Shao
1ac74e652e [Misc] Fix comparator e2e tests: add polars dep + fix dp-attention test (#21804)
Co-authored-by: Alison Shao <alison.shao@mac.lan>
2026-04-01 15:44:35 -07:00
sglang-bot
ca3ba05a7a chore: bump flashinfer version to 0.6.7 (#21422)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-03-31 21:18:16 -07:00
Anant Sharma
f289d173aa [Deps] Bump xgrammar to 0.1.32 (#21032) 2026-03-26 01:22:37 -07:00
Alison Shao
5297a3cb46 [CI] Rewrite killall_sglang as Python with CI/local dual mode (#21331)
Co-authored-by: Alison Shao <alison.shao@mac.lan>
Co-authored-by: Alison Shao <alison.shao@MacBook-Pro-D2W773R9CD.local>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2026-03-24 23:54:01 -07:00
Xinyuan Tong
d1e95af282 Upgrade transformers==5.3.0 (#17784)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Alison Shao <alisonshao@mac.lan>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-18 13:50:43 -07:00
Rain Jiang
cb1e63aba4 bump fa4 to official released fa4 pkg (#20303) 2026-03-17 17:22:56 -07:00
DefTruth
025691cd9e [diffusion] chore: bump up cache-dit & support quant for diffusers backend (#20361) 2026-03-17 12:51:31 +08:00
Xiaoyu Zhang
15097c5c3b Release sglang kernel 0.4.0 (#20440)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-03-16 20:34:58 +08:00
Ke Bao
e2be31824f [CI] Add ut coverage tool (#20628) 2026-03-15 21:13:45 +08:00
sglang-bot
93afe15b43 chore: bump flashinfer version to 0.6.6 (#20480)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2026-03-14 13:05:10 -07:00
Simo Lin
654fc02cf1 [gRPC] Extract gRPC servicer into standalone package (#20478)
Signed-off-by: Simo Lin <linsimo.mark@gmail.com>
2026-03-13 09:13:29 -07:00
Yuhao Yang
a57a44739f [diffusion] deps: upgrade diffusers from 0.36.0 to 0.37.0 (#20318) 2026-03-12 19:17:28 +08:00
Rain Jiang
61b228239e bump sgl-fa4 version to 4.0.5 to loose torch deps (#20378) 2026-03-11 13:08:09 -07:00
Xiaoyu Zhang
680d9d98e4 Fix cutedsl ci error (#20309)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-03-11 16:17:35 +08:00
Xinyuan Tong
4a757990a1 [VLM] Replace decord with torchcodec for video decoding (#20055)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
2026-03-09 19:23:49 +08:00
Xinyu Zhang
b3cfad0a80 Add Ray actor support for scheduler process management (DP=1) (#17684)
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-03-05 13:21:23 -08:00
Rain Jiang
472eef4071 fa4 cleanup (#19727) 2026-03-05 17:54:25 +08:00
Kangyan-Zhou
198381d9ce Add SSL/TLS support for HTTP and gRPC servers (#18973)
Co-authored-by: guys@spotify.com
2026-03-04 19:27:16 -08:00
Jasonzhang517
d939e26585 [model gateway][0/N] router EPD support: add encoder grpc server backend support (#16552)
Co-authored-by: Zongyao Chen <ZongYao.Chen@linux.alibaba.com>
Co-authored-by: Zongyao Chen <solar1s@163.com>
2026-03-03 19:38:15 +08:00
Mohammad Miadh Angkad
6822941514 [FlashInfer] Bump FlashInfer version from 0.6.3 to 0.6.4 (#19005) 2026-03-02 16:12:09 -08:00
Prozac614
57c5c343d7 [diffusion] model: support Hunyuan3D-2 (#18170)
Co-authored-by: yingluosanqian <yingluosanqian@gmail.com>
Co-authored-by: daiweitao <dwti614707404@163.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-02 12:28:05 +08:00
DefTruth
78d6674c45 [diffusion] feat: support hybrid parallelism for diffusers backend (#19405) 2026-02-27 00:06:08 +08:00
Mick
241ee90164 [diffusion] chore: tiny fix pyproject.toml (#19256) 2026-02-25 11:57:53 +08:00
GMI Xiao Jin
fcfd964d7d [diffusion] model: LTX-2 Support PR3 (#19151) 2026-02-24 16:55:28 +08:00
Mohammad Miadh Angkad
1be41e9036 [FlashInfer] Bump FlashInfer version from 0.6.2 to 0.6.3 (#18448) 2026-02-14 07:43:33 +08:00
Simo Lin
92c5749f41 refactor: replace local proto compilation with smg-grpc-proto package (#18682) 2026-02-12 05:29:24 -08:00
shaharmor98
c6aa1863be Add Nemotron 3 Nano tests (#18119)
Signed-off-by: Shahar Mor <smor@nvidia.com>
2026-02-06 23:55:42 +08:00
linhaifeng
c1d5cc3b24 [Bugfix] fix a obvious logic error (#18254) 2026-02-04 13:59:58 -08:00
Mick
977096ae03 [diffusion] cli: introduce generic attention backend configuration in ServerArgs (#18036) 2026-02-02 09:47:40 +08:00
Baizhou Zhang
c7d53fa26a Set torch url index in pyproject.toml (#16802) 2026-02-01 13:23:52 +08:00
Prozac614
3fcda00e8c [CI] Fix CI timeouts by upgrading runai_model_streamer (related to #16937) (#17636) 2026-01-28 17:09:45 -08:00
shaharmor98
f6f1b6d000 Bump FI version (#17700)
Signed-off-by: Shahar Mor <smor@nvidia.com>
Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>
2026-01-26 16:50:06 +08:00
Kangyan-Zhou
48f4340b14 Exclude some diffusion package for ARM in docker release (#17745) 2026-01-25 23:32:39 -08:00
Kangyan-Zhou
8d3e1ac0c8 Add an all type in pyproject.tml to include diffusion support (#17697) 2026-01-25 12:52:13 -08:00
Chi McIsaac
71482dd171 [diffusion] feat: enable passing Cache‑DiT config for diffusers backend (#16662)
Signed-off-by: Chi <chixie.mcisaac@gmail.com>
Signed-off-by: qimcis <chixie.mcisaac@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-01-22 13:13:34 +08:00
Baizhou Zhang
fafa171529 [hotfix] Fixes on cuda 13 docker image (#17541)
Co-authored-by: iforgetmyname <iforgetmyname@users.noreply.github>
2026-01-22 12:29:55 +08:00
Lianmin Zheng
b74a57a8d9 [Auto Sync] Update detokenizer_manager.py, io_struct.py, mu... (20260120) (#17442)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Wangfan Fu <wangfan@x.ai>
2026-01-21 14:48:32 -08:00
DarkSharpness
95f59c13fd [Chore] include all jit files in building packages (#17493) 2026-01-21 14:48:02 -08:00
Jacob Gordon
cda43ffa4d ci: avoids duplication of codespell config (#17519) 2026-01-21 12:02:37 -08:00
Baizhou Zhang
ea879c7739 [Minor] Correct sglang version when installing from source (#17315) 2026-01-18 19:36:16 -08:00