mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-04-28 10:21:48 +00:00
It was actually ready but not turned on. Having forgotten, I made a new implementation along the lines of the fp16 implementation (i.e., using tiling). That matched tiinyBLAS performance. But the existing implementation that I now turned on is faster: PP-512 = 134 t/s vs 128.3 t/s for tinyBLAS TG-128 = 8.7 t/s vs 8.3 t/s for tinyBLAS (@ 4 threads)
160 KiB
160 KiB