Files
ik_llama.cpp/ggml
Iwan Kawrakow 71f5b941bf Experimenting with flash attention on Zen4
This version is finally faster up to 32k tokens.
At 32k tokens it bets no-FA by 23%, at 16k by 20%,
at 8k by 10%.
2024-08-30 15:37:52 +03:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00