Files
ik_llama.cpp/ggml
Iwan Kawrakow e08e292bea q8_KV_r8: don't use nrc_y = 16 on Zen4
This is faster - 350 t/s. Why?
Much better than the 290 t/s we had before, but still slower
than the 370 t/s for q8_k_r8.
2025-02-19 10:03:15 +02:00
..
2024-07-27 07:55:01 +02:00
2025-02-19 10:03:15 +02:00
2024-07-27 07:55:01 +02:00