mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-02-25 15:44:10 +00:00
Looks very promising - I get PP-512(LLaMA-3.1-8B) = 230 t/s on the Ryzen-7950X! This is faster than any other quant and ~40% faster than iq4_nl.