Files
tabbyAPI/backends
kingbri 773639ea89 Model: Fix flash-attn checks
If flash attention is already turned off by exllamaV2 itself, don't
try creating a paged generator. Also condense all the redundant
logic into one if statement.

Also check arch_compat_overrides to see if flash attention should
be disabled for a model arch (ex. Gemma 2)

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-06 20:58:24 -04:00
..
2024-07-06 20:58:24 -04:00