Files
ik_llama.cpp/ggml
Iwan Kawrakow c848533580 iq2_bn_r4: NEON
PP-512 is now 296 t/s. TG-128 is ~20% faster than iq2_bn
for 1 thread, but saturates to about the same 93 t/s at
8 threads.
2024-12-05 17:40:12 +01:00
..
2024-07-27 07:55:01 +02:00
2024-12-05 15:18:33 +02:00
2024-12-05 17:40:12 +01:00
2024-07-27 07:55:01 +02:00
2024-10-04 14:43:26 +03:00