Commit Graph

54 Commits

Author SHA1 Message Date
David Cheung
ed427e1299 Migrate all callers from /get_server_info to /server_info (#21463) 2026-04-01 21:17:50 -07:00
zwang86
5fc5c18bed fix(security): replace unsafe pickle.loads with SafeUnpickler for CVE-2026-3989 (#20904) 2026-03-27 00:43:41 -07:00
Ratish P
ae6f6e1495 [Refactor] Benchmark: Add typed DatasetArgs/Loader registry and CPU dataset unit tests (#19147)
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
2026-02-24 12:22:01 -08:00
Liangsheng Yin
1f2da824dd [Benchmark] Remove re-exports from bench_serving.py (#19130) 2026-02-21 14:30:30 -08:00
SoluMilken
07a24f1a38 update pre-commit config (#18860) 2026-02-16 00:18:31 +08:00
shuwenn
3299c4f9c1 [CI] feat: add early exit to wait_for_server when process dies (#18602) 2026-02-13 16:46:09 -08:00
cswuyg
33c053c50c fix(benchmark): add missing args for speculative decoding benchmark (#17974) 2026-01-29 23:05:42 -08:00
Chenxi Li
b7c7e03d93 Fix crash dump replay script for image data replay (#16277) 2026-01-02 13:42:22 -08:00
Baizhou Zhang
42fcf5438f Revert "tiny remove deprecated endpoint call" (#14533) 2025-12-05 23:48:54 -08:00
b8zhong
ec7b2c16d9 tiny remove deprecated endpoint call (#13607) 2025-12-05 09:54:49 -08:00
Lzhang-hub
2847e5c4b4 fix bench_speculative bug (#13197) 2025-11-20 17:09:04 +08:00
Xiaoyu Zhang
8b5e2c5368 [Tiny fix] Fix bench_speculative.py run bug (#13416) 2025-11-17 18:58:19 +08:00
Liangsheng Yin
ae7698fbd5 Remove deprecated scripts (#13399) 2025-11-17 16:54:39 +08:00
Zaili Wang
50b6842b4b fix: Add default value for backend in sample_mmmu_requests (#12256) 2025-10-31 19:31:40 +08:00
fzyzcjy
fdc4e1e570 Tiny move files to utils folder (#11166) 2025-10-03 22:40:06 +08:00
Lzhang-hub
4efe2c57c9 support vlm model spec bench (#10173) 2025-09-10 13:37:04 +08:00
Chayenne
9b08d975a0 [docs] Refactor, remove compiled results and add gpt-oss (#9613)
Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
2025-08-25 15:27:06 -07:00
Lianmin Zheng
c480a3f6ea Minor style fixes for sgl-kernel (#9289) 2025-08-18 09:38:35 -07:00
Kay Yan
975a5ec69c [fix] update bench_speculative.py for compatibility (#7764)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
2025-07-04 16:32:54 +08:00
Lianmin Zheng
22352d47a9 Improve streaming, log_level, memory report, weight loading, and benchmark script (#7632)
Co-authored-by: Kan Wu <wukanustc@gmail.com>
2025-06-29 23:16:19 -07:00
Lianmin Zheng
0f218731e3 Do not run frontend_reasoning.ipynb to reduce the CI load (#7073) 2025-06-10 17:15:31 -07:00
fzyzcjy
25be63d0b2 Auto handle PD disaggregation in bench_serving (#6587)
Co-authored-by: yizhang2077 <1109276519@qq.com>
2025-05-25 22:41:27 -07:00
Byron Hsu
2d831c6ef9 [PD] Support structured output (#6560) 2025-05-23 21:49:00 -07:00
Byron Hsu
8233cc10fd [PD] Support logprob & Add failure test (#6558) 2025-05-23 14:29:20 -07:00
Yineng Zhang
eabcf82acb feat: add long context example (#6391) 2025-05-18 01:45:17 -07:00
Yineng Zhang
7282ab741a fix: update bench_speculative (#5649) 2025-04-22 16:08:15 -07:00
Byron Hsu
bf98d2e377 [PD] Support prefill overlap + Ensure no race condition (#5609) 2025-04-21 12:12:56 -07:00
Byron Hsu
deded17f38 [PD] Fix edge case and simplify large page size + chunked prefill (#5589) 2025-04-21 10:27:02 -07:00
Byron Hsu
c951d312ed [PD] Fix large page size + chunk prefill (#5588) 2025-04-20 17:21:54 -07:00
Baizhou Zhang
6fb29ffd9e Deprecate enable-flashinfer-mla and enable-flashmla (#5480) 2025-04-17 01:43:33 -07:00
lukec
a53fe428f9 Support FlashMLA backend (#4472)
Co-authored-by: yinfan98 <1106310035@qq.com>
2025-03-16 09:07:06 -07:00
Ke Bao
f1d09a6541 Update bench speculative script (#4235) 2025-03-09 12:19:01 -07:00
Adarsh Shirawalmath
19fd57bcd7 [docs] fix HF reference script command (#4148) 2025-03-06 13:21:54 -08:00
Lianmin Zheng
935cda944b Misc clean up; Remove the support of jump forward (#4032) 2025-03-03 07:02:14 -08:00
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00
Yineng Zhang
bc6ad367c2 fix lint (#2733) 2025-01-05 14:45:42 +08:00
Ce Gao
f5d0865b25 feat: Support VLM in reference_hf (#2726)
Signed-off-by: Ce Gao <gaocegege@hotmail.com>
2025-01-03 22:32:30 +08:00
Ying Sheng
e1e595d702 [feat] Refactor session control interface and add CI (#2173) 2024-11-25 12:32:51 -08:00
Xuehai Pan
62a4a339eb docs: fix module docstrings and copyright headers (#2077) 2024-11-22 22:16:53 +08:00
Byron Hsu
30af7dfb34 [router] add base_gpu_id server args & merged radix tree python reference (#2115) 2024-11-21 17:13:33 -08:00
Lianmin Zheng
56a347f7d3 Move test_session_id.py to playground (#2104) 2024-11-20 01:28:27 -08:00
Ke Bao
62832bb272 Support cuda graph for DP attention (#2061) 2024-11-17 16:29:20 -08:00
Chayenne
c77c1e05ba fix black in pre-commit (#1940) 2024-11-08 07:42:47 +08:00
Xuehai Pan
a5e0defb5a minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926) 2024-11-06 13:46:04 +00:00
Jani Monoses
916b3cdddc Allow passing dtype and max_new_tokens to HF reference script (#1903) 2024-11-03 08:24:37 -08:00
Ying Sheng
c5325aba75 [Profile] Add pytorch profiler (#1604) 2024-10-07 14:37:16 -07:00
Lianmin Zheng
fb2d0680e0 [Fix] Fix clean_up_tokenization_spaces in tokenizer (#1510) 2024-09-24 21:37:33 -07:00
Lianmin Zheng
2854a5ea9f Fix the overhead due to penalizer in bench_latency (#1496) 2024-09-23 07:38:14 -07:00
Lianmin Zheng
167591e864 Better unit tests for adding a new model (#1488) 2024-09-22 01:50:37 -07:00
Ying Sheng
37963394aa [Feature] Support LoRA path renaming and add LoRA serving benchmarks (#1433) 2024-09-15 12:46:04 -07:00