Files
ik_llama.cpp/ggml
Iwan Kawrakow 9354ea22f6 Try interleaving 8 iq4_xs rows
It is also faster on AVX2.

This is the NEON implementation. It is tiny bit faster than
4 interleaved rows (~0.5%).

So, this looks like a winner given the Zen4/AVX2 improvement
without associated NEON egression.
2025-01-25 15:17:23 +02:00
..
2024-07-27 07:55:01 +02:00
2025-01-25 15:17:23 +02:00
2024-07-27 07:55:01 +02:00
2024-10-04 14:43:26 +03:00