mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-03-11 14:30:02 +00:00
* Fuse concat and copy into K cache * Avoid ggml_cont() when n_token = 1 Combined effect: about +2% in TG performance with full GPU offload Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>