Files
ik_llama.cpp/ggml
Iwan Kawrakow 1ac69af2fe Try interleaving 8 rows for iq4_xs
On Zen4, PP-512 goes up from ~260 t/s to 288 t/s for L3-8B.
TG-128 reaches max. performance at 2 threads and is slightly
higher than 4 interleaved rows (14.48 t/s vs 13.11 t/s @ 2 threads
and 14/28 t/s @ 4 threads).
2025-01-25 11:01:44 +02:00
..
2024-07-27 07:55:01 +02:00
2025-01-25 11:01:44 +02:00
2024-07-27 07:55:01 +02:00
2024-10-04 14:43:26 +03:00