ik_llama.cpp/iqk_mul_mat.cpp at 64da6f7a971eda1030f0f641ba6b43dca6d0dcc6

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-28 10:21:48 +00:00

Files

Kawrakow 64da6f7a97 iqk_mul_mat: add q8_0

It was actually ready but not turned on.
Having forgotten, I made a new implementation along the
lines of the fp16 implementation (i.e., using tiling).
That matched tiinyBLAS performance. But the existing
implementation that I now turned on is faster:
PP-512 = 134 t/s vs 128.3 t/s for tinyBLAS
TG-128 = 8.7 t/s vs 8.3 t/s for tinyBLAS (@ 4 threads)

2024-06-22 12:02:50 +03:00

160 KiB

Raw Blame History

View Raw

160 KiB Raw Blame History

160 KiB

Raw Blame History