mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-02-26 08:04:09 +00:00
Now everything is done in iqk_flash_helper_2. It is slower than no FA at 2048 tokens we have 167 vs 176 t/s. This is better than Georgi's FA (138 t/s), but... At 8192 tokens we degrade to 93 t/s vs 134 t/s without.