ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-13 23:40:09 +00:00

Files

firecoperana c1931663ad server: improve speed of speculative decoding (#1119 )

* server: improve speed of speculative decoding

change logs

rpc: add recompute

spec dec fix

* Fix n_batch_size not set to context size for draft model

---------

Co-authored-by: firecoperana <firecoperana>

2026-01-10 08:01:22 +02:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

base64.hpp

llava : expose as a shared library for downstream projects (#3613 )

2023-11-07 00:36:23 +03:00

build-info.cpp.in

build : link against build info instead of compiling against it (#3879 )

2023-11-02 08:50:16 +02:00

chat-parser-xml-toolcall.cpp

fix kimi-k2 tool call (#996 )

2025-11-24 06:51:16 +01:00

chat-parser-xml-toolcall.h

common: Generalized XML-style tool-call parsing with streaming support (#958 )

2025-11-18 15:29:58 +01:00

chat-parser.cpp

Add back the fix for Kimi-K2 tool-call parsing issues (#1070 )

2025-12-16 14:44:47 +01:00

chat-parser.h

Refactor chat and server file (#1062 )

2025-12-15 08:27:20 +01:00

chat.cpp

fix grammar for Kimi-K2 (#1103 )

2026-01-05 07:57:25 +02:00

chat.h

Refactor chat and server file (#1062 )

2025-12-15 08:27:20 +01:00

CMakeLists.txt

Refactor chat and server file (#1062 )

2025-12-15 08:27:20 +01:00

common.cpp

server: improve speed of speculative decoding (#1119 )

2026-01-10 08:01:22 +02:00

common.h

Turn on graph reuse by default (#1094 )

2025-12-27 08:27:16 +01:00

console.cpp

check C++ code with -Wmissing-declarations (#3184 )

2023-09-15 15:38:27 -04:00

console.h

gguf : new file format with flexible meta data (beta) (#2398 )

2023-08-21 23:07:43 +03:00

grammar-parser.cpp

Update grammar (#1023 )

2025-11-30 18:45:38 +01:00

grammar-parser.h

Tool calls support from mainline (#723 )

2025-09-01 08:38:49 +03:00

json-partial.cpp

common: Generalized XML-style tool-call parsing with streaming support (#958 )

2025-11-18 15:29:58 +01:00

json-partial.h

Move minja and nlohmann/json to vendor (#802 )

2025-09-27 09:12:35 +02:00

json-schema-to-grammar.cpp

Update grammar (#1023 )

2025-11-30 18:45:38 +01:00

json-schema-to-grammar.h

common: Generalized XML-style tool-call parsing with streaming support (#958 )

2025-11-18 15:29:58 +01:00

llguidance.cpp

Tool calls support from mainline (#723 )

2025-09-01 08:38:49 +03:00

log.cpp

Refactor chat and server file (#1062 )

2025-12-15 08:27:20 +01:00

log.h

Fix log issue for llama-cli (#1071 )

2025-12-16 18:12:16 +01:00

ngram-cache.cpp

Fixed lookup compilation issues on Windows (#6273 )

2024-03-24 14:21:17 +01:00

ngram-cache.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

regex-partial.cpp

Tool calls support from mainline (#723 )

2025-09-01 08:38:49 +03:00

regex-partial.h

Tool calls support from mainline (#723 )

2025-09-01 08:38:49 +03:00

sampling.cpp

Implement Adaptive-P Sampler (#1100 )

2026-01-10 07:58:53 +02:00

sampling.h

Implement Adaptive-P Sampler (#1100 )

2026-01-10 07:58:53 +02:00

speculative.cpp

Support --device and --device-draft parameter (#866 )

2025-10-27 18:13:28 +02:00

speculative.h

Port universal assisted decoding to llama-server (#699 )

2025-08-18 09:22:23 +03:00

train.cpp

train : change default FA argument (#7528 )

2024-05-25 15:22:35 +03:00

train.h

sync : ggml (backend v2) (#3912 )

2023-11-13 14:16:23 +02:00