mirror of
https://github.com/theroyallab/tabbyAPI.git
synced 2026-03-15 00:07:28 +00:00
Model: Adjust max output len
Max output len should be hardcoded to 16 since it's the amount of tokens to predict per forward pass. 16 is a good value for both normal inference and speculative decoding which also helps save vram compared to 2048 which was the previous default. Signed-off-by: kingbri <bdashore3@proton.me>
This commit is contained in:
@@ -143,6 +143,10 @@ class ExllamaV2Container:
|
||||
# Make the max seq len 4096 before preparing the config
|
||||
# This is a better default than 2038
|
||||
self.config.max_seq_len = 4096
|
||||
|
||||
# Hardcode max output length to 16
|
||||
self.config.max_output_len = 16
|
||||
|
||||
self.config.prepare()
|
||||
|
||||
# Then override the base_seq_len if present
|
||||
|
||||
Reference in New Issue
Block a user