Files
ik_llama.cpp/ggml
Kawrakow 4a6a6f17ee Alternative CUDA FA for SWA models (#754)
* Bounds for flash attention

* Add n_swa to FA parameters

* Fix it

* This seems very slightly better

* Using vec kernel when we have SWA

* Need also this

* f32 vec kernel

* This is slightly better

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-09-04 08:42:18 +02:00
..
2024-07-27 07:55:01 +02:00
2025-08-31 18:16:36 +03:00
2024-07-27 07:55:01 +02:00