mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-02-25 15:44:10 +00:00
We get PP-512(LLaMA-3.1-8B) = 287 t/s. Also cherry-picked the q3_k_r4 AVX2 adaptation that I somehow forgot to push upstream.
We get PP-512(LLaMA-3.1-8B) = 287 t/s. Also cherry-picked the q3_k_r4 AVX2 adaptation that I somehow forgot to push upstream.