mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-04-25 00:49:34 +00:00
The only place left where there are unnecessary assumptions being made is in the Flash Attention code. As we are not using any quants that use per row scales for quantized KV cache, it should be OK for now.