Files
ik_llama.cpp/ggml
Iwan Kawrakow 2b07aa3f2e q2_k_r4: AVX2
We get PP-512(LLaMA-3.1-8B) = 287 t/s.

Also cherry-picked the q3_k_r4 AVX2 adaptation that I somehow
forgot to push upstream.
2024-12-11 16:56:51 +02:00
..
2024-07-27 07:55:01 +02:00
2024-12-11 16:28:56 +02:00
2024-12-11 16:56:51 +02:00
2024-07-27 07:55:01 +02:00
2024-10-04 14:43:26 +03:00