mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-02-20 05:04:11 +00:00
This change tricks it into doing the right thing^TM. Still quite a bit slower than split mode layer for the 8B LlaMA model. But for the 70B LlaMA it now beats split mode layer for TG: 28 t/s vs 24.4 t/s. PP is 627 t/s vs 744 t/s. In comparison, split mode "row" in mainline gets 484 t/s PP and 19.3 t/s TG.