ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-25 00:49:34 +00:00

Files

Iwan Kawrakow 71f5b941bf Experimenting with flash attention on Zen4

This version is finally faster up to 32k tokens.
At 32k tokens it bets no-FA by 23%, at 16k by 20%,
at 8k by 10%.

2024-08-30 15:37:52 +03:00

2024-07-27 07:55:01 +02:00

2024-08-20 17:15:47 +03:00

2024-08-30 15:37:52 +03:00

.gitignore

2024-07-27 07:55:01 +02:00

CMakeLists.txt

2024-08-12 15:14:32 +02:00