Files
ik_llama.cpp/ggml/include/ggml.h
Iwan Kawrakow 5de1cf4885 Faster iq4_xs_r4 on Zen4
The trick is to simply prepare the Q8 block sums for
blocks of 32 as floats. This brings PP-512 up to 254.6 t/s
from 224 t/s.
2024-12-08 15:44:49 +02:00

95 KiB