Files
ik_llama.cpp/ggml
Iwan Kawrakow 31ed9b331e WIP: plugging into ggml_compute_forward_flash_attn_ext_f16
Now everything is done in iqk_flash_helper_2.
It is slower than no FA
at 2048 tokens we have 167 vs 176 t/s.
This is better than Georgi's FA (138 t/s), but...
At 8192 tokens we degrade to 93 t/s vs 134 t/s without.
2024-08-23 16:48:35 +03:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00