Files
ik_llama.cpp/iqk_mul_mat.cpp
Kawrakow 64da6f7a97 iqk_mul_mat: add q8_0
It was actually ready but not turned on.
Having forgotten, I made a new implementation along the
lines of the fp16 implementation (i.e., using tiling).
That matched tiinyBLAS performance. But the existing
implementation that I now turned on is faster:
PP-512 = 134 t/s vs 128.3 t/s for tinyBLAS
TG-128 = 8.7 t/s vs 8.3 t/s for tinyBLAS (@ 4 threads)
2024-06-22 12:02:50 +03:00

160 KiB