Files
ik_llama.cpp/ggml
Iwan Kawrakow e06c83c8ee iq2_bn_r4: simdify q8_K16 quantization (AVX2)
PP-512 becomes 834 t/s and TG-128 now saturates to the same
performance as iq2_bn for 4 threads.
2024-12-06 08:41:54 +02:00
..
2024-07-27 07:55:01 +02:00
2024-12-05 15:18:33 +02:00
2024-07-27 07:55:01 +02:00
2024-10-04 14:43:26 +03:00