Iwan Kawrakow
|
4f53915dcb
|
Cleanup - Arm i-quants should be good now
Still missing iq1_s and iq1_m, but I don't think I'll do those.
|
2024-06-22 12:02:49 +03:00 |
|
Iwan Kawrakow
|
4b27ade2fb
|
iqk_mul_mat: Arm implementation for iq3_s (llama.cpp version)
Here we get 3.65X (!) for PP-512 (53 t/s).
|
2024-06-22 12:02:49 +03:00 |
|
Iwan Kawrakow
|
221a2c3807
|
Simplify
|
2024-06-22 12:02:49 +03:00 |
|
Iwan Kawrakow
|
7dcca6aea7
|
iqk_mul_mat: Arm implementation for iq3_xxs (llama.cpp version)
We get 2.66X for PP-512 (42.35 t/s)
|
2024-06-22 12:02:49 +03:00 |
|
Iwan Kawrakow
|
effa4448d6
|
iqk_mul_mat: Arm implementation for iq2_xs (llama.cpp version)
We get 2.2X for PP-512 (52 t/s)
|
2024-06-22 12:02:49 +03:00 |
|
Iwan Kawrakow
|
d2ee9ab95e
|
iqk_mul_mat: Arm implementation for iq2_s (llama.cpp version)
We get only a 2.07X for PP-512 to get up to 31 t/s,
so iq2_s remains slow.
|
2024-06-22 12:02:49 +03:00 |
|
Iwan Kawrakow
|
9ac9e928d5
|
Add Q8_0
|
2024-06-22 12:02:49 +03:00 |
|
Iwan Kawrakow
|
3f996d0c70
|
Cosmetics
|
2024-06-22 12:02:49 +03:00 |
|
Iwan Kawrakow
|
d7ab97149f
|
iqk_mul_mat: Arm implementation for iq2_xxs (llama.cpp version)
We get ~5% speeedup for TG-128, 3X for PP-512
|
2024-06-22 12:02:49 +03:00 |
|
Iwan Kawrakow
|
b51922530f
|
iqk_mul_mat: faster q3_K TG
We get 31 t/s up from 26 t/s, but we need to treat
PP differently from TG, else we get a ~10% drop in
PP performance.
|
2024-06-22 12:02:49 +03:00 |
|
Iwan Kawrakow
|
19c578b413
|
iqk_mul_mat for llama.cpp
|
2024-06-22 12:02:49 +03:00 |
|