mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-03-02 10:00:07 +00:00
With that we hit 360 t/s for LlaMA-3.1-8B on a Ryzen-7950X. q8_k_r8 is 386 t/s, so for a batch size of 512 repacking costs ~7% of the time taken by the actual GEMM.