mirror of
https://github.com/theroyallab/tabbyAPI.git
synced 2026-04-28 02:01:24 +00:00
Model: Adjust max output len
Max output len should be hardcoded to 16 since it's the amount of tokens to predict per forward pass. 16 is a good value for both normal inference and speculative decoding which also helps save vram compared to 2048 which was the previous default. Signed-off-by: kingbri <bdashore3@proton.me>
This commit is contained in:
@@ -143,6 +143,10 @@ class ExllamaV2Container:
|
|||||||
# Make the max seq len 4096 before preparing the config
|
# Make the max seq len 4096 before preparing the config
|
||||||
# This is a better default than 2038
|
# This is a better default than 2038
|
||||||
self.config.max_seq_len = 4096
|
self.config.max_seq_len = 4096
|
||||||
|
|
||||||
|
# Hardcode max output length to 16
|
||||||
|
self.config.max_output_len = 16
|
||||||
|
|
||||||
self.config.prepare()
|
self.config.prepare()
|
||||||
|
|
||||||
# Then override the base_seq_len if present
|
# Then override the base_seq_len if present
|
||||||
|
|||||||
Reference in New Issue
Block a user