ik_llama.cpp/ggml/include/ggml.h at 4dc97b187b36e4bb06ba4c2bf01db90a3d9f2738

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-29 02:41:47 +00:00

Files

Iwan Kawrakow 5de1cf4885 Faster iq4_xs_r4 on Zen4

The trick is to simply prepare the Q8 block sums for
blocks of 32 as floats. This brings PP-512 up to 254.6 t/s
from 224 t/s.

2024-12-08 15:44:49 +02:00

View Raw