Files
ik_llama.cpp/ggml
Kawrakow 77396a74b5 Better FlashMLA (#243)
* This is a better FA for TG

It should benefit MLA and GQA. Tested to work with
DeepSeek-Lite MLA, not yet for GQA.
For tg64@pp8192 it is ~13% faster than MLA without FA,
and 57% faster that the main branch FA.

* WIP

* Cleanup

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-03-07 09:46:58 +02:00
..
2024-07-27 07:55:01 +02:00
2025-03-07 09:46:58 +02:00
2024-07-27 07:55:01 +02:00