ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-25 23:54:10 +00:00

Files

Iwan Kawrakow 41c8200d08 iq1_tn: improve Zen4

PP-512 goes to 485 t/s up from 352. With FA we get 545 t/s up from 380.
TG-128 @ 1 thread goes to 12.4 t/s up from 10.4.
However, we seem to have a bottleneck somewhere as
TG saturates at 8 threads.

2024-09-09 09:02:33 +03:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

Adding iq1_tn - 1.6875 bpw for TriLM ternary models

2024-09-08 17:56:15 +03:00

src

iq1_tn: improve Zen4

2024-09-09 09:02:33 +03:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00