Files
ik_llama.cpp/github-data/pull_requests/252 - MLA-2_ Allow usage of q8_0 for KV cache on CUDA.md
2025-07-23 13:31:53 +02:00

344 B

🔀 #252 - MLA-2: Allow usage of q8_0 for KV cache on CUDA

Author ikawrakow
State Closed
Created 2025-03-12
Updated 2025-03-12

Description

Performance is slightly lower than f16 KV cache but not too bad.