ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-27 09:53:40 +00:00

Files

Kawrakow 996e77047a Avoid ggml_get_rows if not necessary (#1160 )

* Copy reduce result to other GPUs if necessary

* Avoid ggml_get_rows for TG

* For the output ops use the result of the split that ran on the main GPU

* More models

2026-01-20 15:38:21 +02:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

server: improve speed of speculative decoding (#1119 )

2026-01-10 08:01:22 +02:00

src

Avoid ggml_get_rows if not necessary (#1160 )

2026-01-20 15:38:21 +02:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

CUDA: compress-mode size (#1110 )

2026-01-07 18:33:17 +02:00