mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-02-27 16:44:21 +00:00
* Faster iq4_xs_r4 on Zen4 The trick is to simply prepare the Q8 block sums for blocks of 32 as floats. This brings PP-512 up to 254.6 t/s from 224 t/s. * Fix broken matrix x vector product on Zen4 --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>