mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-04-20 22:49:31 +00:00
This works on the CPU. PP performance is ~13% better for 16k tokens and compute buffer is quite a bit smaller.
This works on the CPU. PP performance is ~13% better for 16k tokens and compute buffer is quite a bit smaller.