Files
tabbyAPI/backends
kingbri 62e9fa217a ExllamaV3: Handle max_seq_len defined and cache_size undefined case
The previous changes broke existing configs and max_seq_len was
force-overriden to 4096. This helps single-user setups since they
do not really benefit from the split cache_size max_seq_len mechanism
(except if batching).

cache_size is still the prime mover in exl3 due to its paging mechanism.
Ideally, for multi-user setups, cache_size should take as much VRAM
as possible and max_seq_len should be limited.

Breakdown:
cache_size and max_seq_len specified -> values
only cache_size/max_seq_len specified -> max_seq_len = cache_size and vice versa
neither specified -> cache_size = 4096, max_seq_len = min(max_position_embeddings, cache_size)

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-10-14 21:48:36 -04:00
..
2025-10-14 03:11:59 +02:00