mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-03-11 22:40:01 +00:00
Just scaler and AVX2 for now. PP-512 is even faster (325 t/s on the Ryzn-7950X, 404 t/s on Ryzen-5975WX). We lose ~6-7% for TG due to being memory bound and the model being 10% larger.
14 KiB
14 KiB