mirror of
https://github.com/theroyallab/tabbyAPI.git
synced 2026-03-14 15:57:27 +00:00
Model: Fix chunk size handling
Wrong class attribute name used for max_attention_size and fixes declaration of the draft model's chunk_size. Also expose the parameter to the end user in both config and model load. Signed-off-by: kingbri <bdashore3@proton.me>
This commit is contained in:
@@ -107,6 +107,10 @@ model:
|
||||
# Possible values FP16, FP8, Q4. (default: FP16)
|
||||
#cache_mode: FP16
|
||||
|
||||
# Chunk size for prompt ingestion. A lower value reduces VRAM usage at the cost of ingestion speed (default: 2048)
|
||||
# NOTE: Effects vary depending on the model. An ideal value is between 512 and 4096
|
||||
#chunk_size: 2048
|
||||
|
||||
# Set the prompt template for this model. If empty, attempts to look for the model's chat template. (default: None)
|
||||
# If a model contains multiple templates in its tokenizer_config.json, set prompt_template to the name
|
||||
# of the template you want to use.
|
||||
|
||||
Reference in New Issue
Block a user