ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-25 23:54:10 +00:00

Files

Iwan Kawrakow 9780ac4591 iq2_tn: AVX2 PP improvement

We now get PP-512 = 490.73 t/s for TriLM-3.9B on the Ryzen-5975WX.
We have PP-512 = 636.61 t/s for Bintnet-3B quantized with iq2_bn.
Bintnet-3B is actually 3.4B, TriLM-3.9B is 3.99B, so we would
expect 3.43/3.99 * 636 = 546 t/s, so it seems we still have something
that is not quite optimal in iq2_tn.

2024-08-06 12:34:44 +03:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

iq2_tn: TriLM specific 2.0625 bpw quantization

2024-08-05 14:22:05 +03:00

src

iq2_tn: AVX2 PP improvement

2024-08-06 12:34:44 +03:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00