Files
ik_llama.cpp/src
Kawrakow cd7e7b6bbc Allow bf16 kv-cache (#69)
On the CPU I get the exact same PPL with and without FA
using bf16 for kv-cache. But on CUDA the bf16 kv-cache
result is about the same as the fp16 kv-cache CPU result,
so I'm missing some conversion somewhere.

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-09-29 09:03:52 +03:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-09-28 17:59:47 +03:00
2024-07-27 07:55:01 +02:00
2024-09-29 09:03:52 +03:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00