mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-02-25 23:54:10 +00:00
We now arrive at pp-512 = 147 t/s for LLaMA-3.1-8B. TG-128 is 9.5 t/s. This is better than last commit, but still kind of slow compared to Q6_K. My last commit message is wrong: also iq3_k needs a fix for overflow.