Files
ik_llama.cpp/src
Kawrakow 7747000f3b DeepSeek TG optimizations for TG (#928)
* Fuse concat and copy into K cache
* Avoid ggml_cont() when n_token = 1

Combined effect: about +2% in TG performance with full GPU offload

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-11-10 09:52:07 +02:00
..
2025-10-30 10:49:48 +02:00
2025-06-19 10:24:53 +03:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00