Files
ik_llama.cpp/ggml
Iwan Kawrakow d12d0e9b04 Allow bf16 kv-cache
On the CPU I get the exact same PPL with and without FA
using bf16 for kv-cache. But on CUDA the bf16 kv-cache
result is about the same as the fp16 kv-cache CPU result,
so I'm missing some conversion somewhere.
2024-09-29 08:42:33 +03:00
..
2024-07-27 07:55:01 +02:00
2024-09-28 13:37:25 +03:00
2024-09-29 08:42:33 +03:00
2024-07-27 07:55:01 +02:00