Files
ik_llama.cpp/ggml
Kawrakow 8c94dcd433 Zen4 Flash Attnetion 2 (#36)
* Zen4 Flash Attnetion: WIP generalize to other types

Now loading of data from K and V is done via a template parameter,
so this should make it easy to generalize to typ[es other than
F16 for the K and V cache.

* Zen4 Flash Attnetion: it works for q4_0 and q8_0

* Zen4 Flash Attnetion: small q8_0 performance improvement

* Zen4 Flash Attnetion: add q4_1

* Delete unused stuff

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-09-04 07:20:55 +03:00
..
2024-07-27 07:55:01 +02:00
2024-08-27 17:40:59 +03:00
2024-09-04 07:20:55 +03:00
2024-07-27 07:55:01 +02:00