Files
ik_llama.cpp/ggml
Iwan Kawrakow e528505fc8 iq2_tn: NEON
For TriLM-3.9B running on the M2-Max we get PP-512 = 193.5 t/s,
TG-128 = 75.5 t/s. This is in line with what we have for
iq2_bn ant 3.3B Bitnet.
2024-08-06 06:19:59 +02:00
..
2024-07-27 07:55:01 +02:00
2024-08-06 06:19:59 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00