Files
ik_llama.cpp/ggml
Iwan Kawrakow 58c13d0574 q8_KV: ARM_NEON
We get PP-512 = 167 t/s for L3-8B without interleaving!
We do the interleaving on the fly, so I wonder if this
could be done for other quants as well.
2025-02-19 10:03:15 +02:00
..
2024-07-27 07:55:01 +02:00
2025-02-19 10:03:15 +02:00
2024-07-27 07:55:01 +02:00