Files
ik_llama.cpp/ggml
Iwan Kawrakow 8a83e1f083 cuda: re-add q8_0 -> q8_0 transpose
so mla = 2 can be used with CUDA graphs and q8_0 cache.
2025-08-14 19:33:58 +03:00
..
2024-07-27 07:55:01 +02:00
2025-08-14 19:33:58 +03:00
2024-07-27 07:55:01 +02:00