Files
ik_llama.cpp/ggml
Iwan Kawrakow cfee1b68ec WIP: plugging into ggml_compute_forward_flash_attn_ext_f16
OK, if we take into account that the mask is diagonal
and skip further computations once we encounter -INFINITY,
we can speed it up and make it on par with no-FA.
Better than nothing, but still no luck.
2024-08-24 14:31:13 +03:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00