ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-11 14:30:02 +00:00

Files

Kawrakow 7747000f3b DeepSeek TG optimizations for TG (#928 )

* Fuse concat and copy into K cache
* Avoid ggml_cont() when n_token = 1

Combined effect: about +2% in TG performance with full GPU offload

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2025-11-10 09:52:07 +02:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

CUDA: set compute parameters via command line arguments (#910 )

2025-11-07 07:11:23 +02:00

src

DeepSeek TG optimizations for TG (#928 )

2025-11-10 09:52:07 +02:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Disable CUDA fusion by default for now (#903 )

2025-11-05 10:58:12 +02:00