Files
ik_llama.cpp/ggml
Iwan Kawrakow 9780ac4591 iq2_tn: AVX2 PP improvement
We now get PP-512 = 490.73 t/s for TriLM-3.9B on the Ryzen-5975WX.
We have PP-512 = 636.61 t/s for Bintnet-3B quantized with iq2_bn.
Bintnet-3B is actually 3.4B, TriLM-3.9B is 3.99B, so we would
expect 3.43/3.99 * 636 = 546 t/s, so it seems we still have something
that is not quite optimal in iq2_tn.
2024-08-06 12:34:44 +03:00
..
2024-07-27 07:55:01 +02:00
2024-08-06 12:34:44 +03:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00