Files
ik_llama.cpp/ggml
Iwan Kawrakow cd4266eb58 Experimenting with dequant + f16 GEMM on NEON
iq2_kt: PP512 = 79 t/s from 42 t/s
iq3_kt: PP512 = 81 t/s from 35 t/s

Also, found the reason why the f16 implementation for iq4_kt was
not working: it overflows. It works after mltiplying with the row scale
before doing the multiply-adds.
2025-05-31 16:10:33 +03:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00