Files
ik_llama.cpp/ggml
Iwan Kawrakow 41c8200d08 iq1_tn: improve Zen4
PP-512 goes to 485 t/s up from 352. With FA we get 545 t/s up from 380.
TG-128 @ 1 thread goes to 12.4 t/s up from 10.4.
However, we seem to have a bottleneck somewhere as
TG saturates at 8 threads.
2024-09-09 09:02:33 +03:00
..
2024-07-27 07:55:01 +02:00
2024-09-09 09:02:33 +03:00
2024-07-27 07:55:01 +02:00