mirror of
https://github.com/theroyallab/tabbyAPI.git
synced 2026-03-15 00:07:28 +00:00
* Model: More extensive checks for paged attention Previously, TabbyAPI only checked for whether the user's hardware supports flash attention before deciding whether to enabled paged mode. This adds checks for whether no_flash_attention is set, whether flash-attn is installed, and whether the installed version supports paged attention. * Tree: Format * Tree: Lint * Model: Check GPU architecture first Check GPU arch prior to checking whether flash attention 2 is installed