ik_llama.cpp/252 - MLA-2_ Allow usage of q8_0 for KV cache on CUDA.md at main - ik_llama.cpp - Public git mirror

ikawrakow/ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 09:09:50 +00:00

Files

Thomas eaa2510a28 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

344 B

Raw Permalink Blame History

🔀 #252 - MLA-2: Allow usage of q8_0 for KV cache on CUDA

Author	`ikawrakow`
State	❌ Closed
Created	2025-03-12
Updated	2025-03-12

Description

Performance is slightly lower than f16 KV cache but not too bad.