Files
ik_llama.cpp/ggml
Kawrakow 0459f595d7 CUDA: corectly detect if flash attention is supported (#875)
* Don't use vector kernels if K or V are quantized

* Correctly determine if FA is supported

* Also wmma

* Minor

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-10-29 13:56:16 +02:00
..
2024-07-27 07:55:01 +02:00
2025-10-24 07:40:35 +03:00
2024-07-27 07:55:01 +02:00