mirror of
https://github.com/theroyallab/tabbyAPI.git
synced 2026-03-14 15:57:27 +00:00
The default is the minimum between max_position_embeddings and cache_size. On AMD and older than Ampere NVIDIA GPUs, cache_size is ignored due to not being supported by batching on exl2. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
8.5 KiB
8.5 KiB