David Cheung
|
ed427e1299
|
Migrate all callers from /get_server_info to /server_info (#21463)
|
2026-04-01 21:17:50 -07:00 |
|
zwang86
|
5fc5c18bed
|
fix(security): replace unsafe pickle.loads with SafeUnpickler for CVE-2026-3989 (#20904)
|
2026-03-27 00:43:41 -07:00 |
|
Ratish P
|
ae6f6e1495
|
[Refactor] Benchmark: Add typed DatasetArgs/Loader registry and CPU dataset unit tests (#19147)
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
|
2026-02-24 12:22:01 -08:00 |
|
Liangsheng Yin
|
1f2da824dd
|
[Benchmark] Remove re-exports from bench_serving.py (#19130)
|
2026-02-21 14:30:30 -08:00 |
|
SoluMilken
|
07a24f1a38
|
update pre-commit config (#18860)
|
2026-02-16 00:18:31 +08:00 |
|
shuwenn
|
3299c4f9c1
|
[CI] feat: add early exit to wait_for_server when process dies (#18602)
|
2026-02-13 16:46:09 -08:00 |
|
cswuyg
|
33c053c50c
|
fix(benchmark): add missing args for speculative decoding benchmark (#17974)
|
2026-01-29 23:05:42 -08:00 |
|
Chenxi Li
|
b7c7e03d93
|
Fix crash dump replay script for image data replay (#16277)
|
2026-01-02 13:42:22 -08:00 |
|
Baizhou Zhang
|
42fcf5438f
|
Revert "tiny remove deprecated endpoint call" (#14533)
|
2025-12-05 23:48:54 -08:00 |
|
b8zhong
|
ec7b2c16d9
|
tiny remove deprecated endpoint call (#13607)
|
2025-12-05 09:54:49 -08:00 |
|
Lzhang-hub
|
2847e5c4b4
|
fix bench_speculative bug (#13197)
|
2025-11-20 17:09:04 +08:00 |
|
Xiaoyu Zhang
|
8b5e2c5368
|
[Tiny fix] Fix bench_speculative.py run bug (#13416)
|
2025-11-17 18:58:19 +08:00 |
|
Liangsheng Yin
|
ae7698fbd5
|
Remove deprecated scripts (#13399)
|
2025-11-17 16:54:39 +08:00 |
|
Zaili Wang
|
50b6842b4b
|
fix: Add default value for backend in sample_mmmu_requests (#12256)
|
2025-10-31 19:31:40 +08:00 |
|
fzyzcjy
|
fdc4e1e570
|
Tiny move files to utils folder (#11166)
|
2025-10-03 22:40:06 +08:00 |
|
Lzhang-hub
|
4efe2c57c9
|
support vlm model spec bench (#10173)
|
2025-09-10 13:37:04 +08:00 |
|
Chayenne
|
9b08d975a0
|
[docs] Refactor, remove compiled results and add gpt-oss (#9613)
Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
|
2025-08-25 15:27:06 -07:00 |
|
Lianmin Zheng
|
c480a3f6ea
|
Minor style fixes for sgl-kernel (#9289)
|
2025-08-18 09:38:35 -07:00 |
|
Kay Yan
|
975a5ec69c
|
[fix] update bench_speculative.py for compatibility (#7764)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
|
2025-07-04 16:32:54 +08:00 |
|
Lianmin Zheng
|
22352d47a9
|
Improve streaming, log_level, memory report, weight loading, and benchmark script (#7632)
Co-authored-by: Kan Wu <wukanustc@gmail.com>
|
2025-06-29 23:16:19 -07:00 |
|
Lianmin Zheng
|
0f218731e3
|
Do not run frontend_reasoning.ipynb to reduce the CI load (#7073)
|
2025-06-10 17:15:31 -07:00 |
|
fzyzcjy
|
25be63d0b2
|
Auto handle PD disaggregation in bench_serving (#6587)
Co-authored-by: yizhang2077 <1109276519@qq.com>
|
2025-05-25 22:41:27 -07:00 |
|
Byron Hsu
|
2d831c6ef9
|
[PD] Support structured output (#6560)
|
2025-05-23 21:49:00 -07:00 |
|
Byron Hsu
|
8233cc10fd
|
[PD] Support logprob & Add failure test (#6558)
|
2025-05-23 14:29:20 -07:00 |
|
Yineng Zhang
|
eabcf82acb
|
feat: add long context example (#6391)
|
2025-05-18 01:45:17 -07:00 |
|
Yineng Zhang
|
7282ab741a
|
fix: update bench_speculative (#5649)
|
2025-04-22 16:08:15 -07:00 |
|
Byron Hsu
|
bf98d2e377
|
[PD] Support prefill overlap + Ensure no race condition (#5609)
|
2025-04-21 12:12:56 -07:00 |
|
Byron Hsu
|
deded17f38
|
[PD] Fix edge case and simplify large page size + chunked prefill (#5589)
|
2025-04-21 10:27:02 -07:00 |
|
Byron Hsu
|
c951d312ed
|
[PD] Fix large page size + chunk prefill (#5588)
|
2025-04-20 17:21:54 -07:00 |
|
Baizhou Zhang
|
6fb29ffd9e
|
Deprecate enable-flashinfer-mla and enable-flashmla (#5480)
|
2025-04-17 01:43:33 -07:00 |
|
lukec
|
a53fe428f9
|
Support FlashMLA backend (#4472)
Co-authored-by: yinfan98 <1106310035@qq.com>
|
2025-03-16 09:07:06 -07:00 |
|
Ke Bao
|
f1d09a6541
|
Update bench speculative script (#4235)
|
2025-03-09 12:19:01 -07:00 |
|
Adarsh Shirawalmath
|
19fd57bcd7
|
[docs] fix HF reference script command (#4148)
|
2025-03-06 13:21:54 -08:00 |
|
Lianmin Zheng
|
935cda944b
|
Misc clean up; Remove the support of jump forward (#4032)
|
2025-03-03 07:02:14 -08:00 |
|
Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|
Yineng Zhang
|
bc6ad367c2
|
fix lint (#2733)
|
2025-01-05 14:45:42 +08:00 |
|
Ce Gao
|
f5d0865b25
|
feat: Support VLM in reference_hf (#2726)
Signed-off-by: Ce Gao <gaocegege@hotmail.com>
|
2025-01-03 22:32:30 +08:00 |
|
Ying Sheng
|
e1e595d702
|
[feat] Refactor session control interface and add CI (#2173)
|
2024-11-25 12:32:51 -08:00 |
|
Xuehai Pan
|
62a4a339eb
|
docs: fix module docstrings and copyright headers (#2077)
|
2024-11-22 22:16:53 +08:00 |
|
Byron Hsu
|
30af7dfb34
|
[router] add base_gpu_id server args & merged radix tree python reference (#2115)
|
2024-11-21 17:13:33 -08:00 |
|
Lianmin Zheng
|
56a347f7d3
|
Move test_session_id.py to playground (#2104)
|
2024-11-20 01:28:27 -08:00 |
|
Ke Bao
|
62832bb272
|
Support cuda graph for DP attention (#2061)
|
2024-11-17 16:29:20 -08:00 |
|
Chayenne
|
c77c1e05ba
|
fix black in pre-commit (#1940)
|
2024-11-08 07:42:47 +08:00 |
|
Xuehai Pan
|
a5e0defb5a
|
minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926)
|
2024-11-06 13:46:04 +00:00 |
|
Jani Monoses
|
916b3cdddc
|
Allow passing dtype and max_new_tokens to HF reference script (#1903)
|
2024-11-03 08:24:37 -08:00 |
|
Ying Sheng
|
c5325aba75
|
[Profile] Add pytorch profiler (#1604)
|
2024-10-07 14:37:16 -07:00 |
|
Lianmin Zheng
|
fb2d0680e0
|
[Fix] Fix clean_up_tokenization_spaces in tokenizer (#1510)
|
2024-09-24 21:37:33 -07:00 |
|
Lianmin Zheng
|
2854a5ea9f
|
Fix the overhead due to penalizer in bench_latency (#1496)
|
2024-09-23 07:38:14 -07:00 |
|
Lianmin Zheng
|
167591e864
|
Better unit tests for adding a new model (#1488)
|
2024-09-22 01:50:37 -07:00 |
|
Ying Sheng
|
37963394aa
|
[Feature] Support LoRA path renaming and add LoRA serving benchmarks (#1433)
|
2024-09-15 12:46:04 -07:00 |
|