Files
ik_llama.cpp/ggml
Iwan Kawrakow 2b58f31b36 FA: allow bf16 for V-cache with any supported K-cache
E.g., -ctk q8_0 -ctv q8_0 is slightly faster than
-ctk q8_0 -ctv q8_0 on Zen4 for not too long context lengths
(say, <= 4096).
2025-01-14 11:40:05 +02:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-10-04 14:43:26 +03:00