mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-03-10 22:10:20 +00:00
* Bounds for flash attention * Add n_swa to FA parameters * Fix it * This seems very slightly better * Using vec kernel when we have SWA * Need also this * f32 vec kernel * This is slightly better --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>