Files
ik_llama.cpp/iqk_mul_mat.cpp
Kawrakow e6d8441397 iq1_bn: better NEON implementation
PP is decent with 131 t/s (q4_0 has 150 t/s).
TG is better than last commit but still bad at 33.1 t/s
(in comparison q4_0 gets 52.3 t/s).

I had to go to the (0, 1, 2) table. Apple Silicon clearly
does not like operations with signs.
2024-06-22 12:02:51 +03:00

178 KiB