mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-04-22 07:29:23 +00:00
We get PP-512(LLaMA-3.1-8B) = 225 t/s TG-128(LLaMA-3.1-8B) = 15.2 t/s We could do slightly better if we arranged the bits in blocks of 128 instead of 32. Thus saves 4 permutes per 256 weights and results in PP-512 = 230 t/s, TG-128 = 15.65 t/s. But for now we leave it the way it is.