Kawrakow
fd20638bbc
Allow bf16 kv-cache ( #69 )
...
On the CPU I get the exact same PPL with and without FA
using bf16 for kv-cache. But on CUDA the bf16 kv-cache
result is about the same as the fp16 kv-cache CPU result,
so I'm missing some conversion somewhere.
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2024-09-29 09:03:52 +03:00
..
2024-08-12 15:14:32 +02:00
2024-09-29 09:03:52 +03:00
2024-08-12 15:14:32 +02:00
2024-09-27 08:16:06 +03:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-08-12 15:14:32 +02:00
2024-09-17 14:31:29 +03:00
2024-08-12 15:14:32 +02:00
2024-07-27 07:55:01 +02:00
2024-08-12 15:14:32 +02:00
2024-07-27 07:55:01 +02:00
2024-08-12 15:14:32 +02:00
2024-08-12 15:14:32 +02:00
2024-08-12 15:14:32 +02:00
2024-09-27 08:16:06 +03:00
2024-09-29 09:03:52 +03:00
2024-08-12 15:14:32 +02:00
2024-08-12 15:14:32 +02:00
2024-09-28 13:37:25 +03:00
2024-09-28 13:37:25 +03:00
2024-09-27 08:16:06 +03:00
2024-08-12 15:14:32 +02:00
2024-08-12 15:14:32 +02:00
2024-08-12 15:14:32 +02:00
2024-08-12 15:14:32 +02:00
2024-09-28 13:37:25 +03:00