mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-02-25 15:44:10 +00:00
The trick is to simply prepare the Q8 block sums for blocks of 32 as floats. This brings PP-512 up to 254.6 t/s from 224 t/s.
The trick is to simply prepare the Q8 block sums for blocks of 32 as floats. This brings PP-512 up to 254.6 t/s from 224 t/s.