Commit Graph

8 Commits

Author SHA1 Message Date
Kawrakow
6aa7ac9cd3 iqk_mul_mat: Arm implementation for iq3_xxs (llama.cpp version)
We get 2.66X for PP-512 (42.35 t/s)
2024-06-22 12:02:49 +03:00
Kawrakow
d041c81b1d iqk_mul_mat: Arm implementation for iq2_xs (llama.cpp version)
We get 2.2X for PP-512 (52 t/s)
2024-06-22 12:02:49 +03:00
Kawrakow
3fe4e1b27c iqk_mul_mat: Arm implementation for iq2_s (llama.cpp version)
We get only a 2.07X for PP-512 to get up to 31 t/s,
so iq2_s remains slow.
2024-06-22 12:02:49 +03:00
Kawrakow
4c0920cb1b Add Q8_0 2024-06-22 12:02:49 +03:00
Kawrakow
62122c1950 Cosmetics 2024-06-22 12:02:49 +03:00
Kawrakow
fb8bc26dc5 iqk_mul_mat: Arm implementation for iq2_xxs (llama.cpp version)
We get ~5% speeedup for TG-128, 3X for PP-512
2024-06-22 12:02:49 +03:00
Kawrakow
a18a564e54 iqk_mul_mat: faster q3_K TG
We get 31 t/s up from 26 t/s, but we need to treat
PP differently from TG, else we get a ~10% drop in
PP performance.
2024-06-22 12:02:49 +03:00
Kawrakow
d434b4751a iqk_mul_mat for llama.cpp 2024-06-22 12:02:49 +03:00