ik_llama.cpp/github-data/pull_requests/43 - iq2_tn_ slightly faster PP on Zen4.md at 9484d150d8da02fa7a3e4cc253d153e7efd6bc64 - ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-30 11:21:56 +00:00

Files

Thomas 0451f10a42 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

With this change we get PP512 = 494 t/s (using flash attention), up from 468 t/s (~5% improvement) running on a Ryzen-7950X CPU.

Compared to the initial IQ2_TN PR #13 the cumulative improvement is 15%.

Compared to TQ2_0 in llama.cpp, which has now been merged, we are now 80% faster.