Files
ik_llama.cpp/ggml
Iwan Kawrakow 7784c8928f Per row scales - CUDA
The only place left where there are unnecessary assumptions being made
is in the Flash Attention code. As we are not using any quants that
use per row scales for quantized KV cache, it should be OK for now.
2024-09-25 13:10:34 +03:00
..
2024-07-27 07:55:01 +02:00
2024-09-25 13:10:34 +03:00
2024-09-25 13:10:34 +03:00
2024-07-27 07:55:01 +02:00