ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-05-12 00:50:22 +00:00

Files

Kawrakow 4872f2f57e Q3_K_R4 (#134 )

* q3_k_r4: Zen4 works, but not as good as it should be

238 t/s, so sloghtly slower than q6_k_r4.

* q3_k_r4: NEON

We get PP-512(LLaMA-3.1-8B) = 106.9 t/s.
This is 1.93X faster than q3_K_S!

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2024-12-11 11:19:00 +01:00

llama.h

Q3_K_R4 (#134 )

2024-12-11 11:19:00 +01:00