mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-02-25 07:34:10 +00:00
This works on the CPU. PP performance is ~13% better for 16k tokens and compute buffer is quite a bit smaller.
This works on the CPU. PP performance is ~13% better for 16k tokens and compute buffer is quite a bit smaller.