mirror of
https://github.com/theroyallab/tabbyAPI.git
synced 2026-03-14 15:57:27 +00:00
Config: Fix comments for max_seq_len and cache_size
The default is the minimum between max_position_embeddings and cache_size. On AMD and older than Ampere NVIDIA GPUs, cache_size is ignored due to not being supported by batching on exl2. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
This commit is contained in:
@@ -78,11 +78,14 @@ model:
|
||||
# Options: exllamav2, exllamav3
|
||||
backend:
|
||||
|
||||
# Max sequence length (default: fetch from the model's config.json).
|
||||
# Max sequence length (default: min(max_position_embeddings, cache_size)).
|
||||
# Set to -1 to fetch from the model's config.json
|
||||
max_seq_len:
|
||||
|
||||
# Size of the key/value cache to allocate, in tokens (default: 4096).
|
||||
# Must be a multiple of 256.
|
||||
# ExllamaV2 note: On AMD GPUs and NVIDIA GPUs older than Ampere, this value
|
||||
# is ignored. Please use max_seq_len
|
||||
cache_size:
|
||||
|
||||
# Enable different cache modes for VRAM savings (default: FP16).
|
||||
|
||||
Reference in New Issue
Block a user