mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-03-03 02:20:01 +00:00
This works on the CPU. PP performance is ~13% better for 16k tokens and compute buffer is quite a bit smaller.
This works on the CPU. PP performance is ~13% better for 16k tokens and compute buffer is quite a bit smaller.