Files
tabbyAPI/common
kingbri 62e9fa217a ExllamaV3: Handle max_seq_len defined and cache_size undefined case
The previous changes broke existing configs and max_seq_len was
force-overriden to 4096. This helps single-user setups since they
do not really benefit from the split cache_size max_seq_len mechanism
(except if batching).

cache_size is still the prime mover in exl3 due to its paging mechanism.
Ideally, for multi-user setups, cache_size should take as much VRAM
as possible and max_seq_len should be limited.

Breakdown:
cache_size and max_seq_len specified -> values
only cache_size/max_seq_len specified -> max_seq_len = cache_size and vice versa
neither specified -> cache_size = 4096, max_seq_len = min(max_position_embeddings, cache_size)

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-10-14 21:48:36 -04:00
..
2024-09-18 20:36:17 -04:00
2025-06-17 22:54:51 -04:00
2025-05-02 21:33:25 -04:00
2025-04-22 21:14:45 -04:00
2025-06-15 19:33:14 +02:00
2024-09-11 18:00:29 +01:00
2025-07-03 12:17:09 -04:00