Files
ik_llama.cpp/ggml
Iwan Kawrakow 92adf7e6df Experimenting with flash attention on Zen4
This version outperforms no-FA at 8k tokens, but then
performance becomes the same at 16k and worse at 32k.
2024-08-30 07:38:54 +03:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00