mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-04-29 02:41:47 +00:00
The trick is to simply prepare the Q8 block sums for blocks of 32 as floats. This brings PP-512 up to 254.6 t/s from 224 t/s.
95 KiB
95 KiB