ik_llama.cpp/examples at 7505165dee0c76ff5d9d8018787cc7981fc1cd0e - ik_llama.cpp - Public git mirror

ikawrakow/ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-13 07:20:15 +00:00

Files

History

Yap Sok Ann 7505165dee Fix truncated logprobs when streaming is off (#998 )

The logic to skip the logprobs of the stop token was originally from
ggml-org/llama.cpp#2849, and was later modified as part of
ggml-org/llama.cpp#10643 to be applied only to STOP_TYPE_WORD.

The latter change wasn't included in #723. Then, after #958 got merged,
the logic got inadvertently applied to GLM-4.5/4.6 and Kimi K2,
resulting in truncated logprobs when streaming is off.

This commit reverts the logic from ggml-org/llama.cpp#2849, such that
the logprobs of the stop token will always be included in the response,
when logprobs is enabled. From testing, this matches with the behavior
of Fireworks inference server, for both chat completions and text
completions endpoints.

Also fix logprobs param handling for the text completion endpoint.

2025-11-24 06:52:15 +01:00

..

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

MoE fix for R4 quants (#170 )

2025-01-12 13:19:14 +02:00

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

convert-llama2c-to-ggml

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

cvector-generator

CUDA: set compute parameters via command line arguments (#910 )

2025-11-07 07:11:23 +02:00

deprecation-warning

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

Merge vulkan code from mainline up to commit of 6/28/2025 (#563 )

2025-07-02 08:49:42 +02:00

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

gguf-split : update (#444 )

2025-05-23 08:07:42 +03:00

llama : allow pooled embeddings on any model (#7477 )

2024-06-21 08:38:22 +03:00

Fix imatrix calculation for MLA models (#411 )

2025-05-13 17:53:38 +03:00

Tool calls support from mainline (#723 )

2025-09-01 08:38:49 +03:00

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

Disable split mode "row" (#987 )

2025-11-19 16:15:50 +01:00

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

add dry sampler (#513 )

2025-06-19 10:24:53 +03:00

add dry sampler (#513 )

2025-06-19 10:24:53 +03:00

add dry sampler (#513 )

2025-06-19 10:24:53 +03:00

Port mdmd from mainline + Qwen2/2.5-VL support (#798 )

2025-09-27 08:45:29 +02:00

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

Add vision support in llama-server (#901 )

2025-11-05 10:43:46 +02:00

Tool calls support from mainline (#723 )

2025-09-01 08:38:49 +03:00

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

More informative PPL readout line (#914 )

2025-11-07 16:41:24 +02:00

Allow quantization of ffn_gate_inp (#896 )

2025-11-05 10:44:32 +02:00

Disable experimental code that causes issues with MSVC (#707 )

2025-08-19 18:09:49 +03:00

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

Fix cuda init error in rpc (#957 )

2025-11-14 06:59:54 +02:00

save-load-state

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

Fix truncated logprobs when streaming is off (#998 )

2025-11-24 06:52:15 +01:00

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

Support --device and --device-draft parameter (#866 )

2025-10-27 18:13:28 +02:00

sweep-bench: be able to set TG tokens via -n (#897 )

2025-11-04 14:39:30 +02:00

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

base-translate.sh

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

chat-13B.bat

Create chat-13B.bat (#592 )

2023-03-29 20:21:09 +03:00

chat-13B.sh

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

chat-persistent.sh

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

chat-vicuna.sh

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

chat.sh

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

CMakeLists.txt

Port mdmd from mainline + Qwen2/2.5-VL support (#798 )

2025-09-27 08:45:29 +02:00

convert_legacy_llama.py

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

json_schema_pydantic_example.py

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

json_schema_to_grammar.py

Tool calls support from mainline (#723 )

2025-09-01 08:38:49 +03:00

llama.vim

llama.vim : added api key support (#5090 )

2024-01-23 08:51:27 +02:00

llm.vim

llm.vim : stop generation at multiple linebreaks, bind to <F2> (#2879 )

2023-08-30 09:50:55 +03:00

Miku.sh

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

pydantic_models_to_grammar_examples.py

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

pydantic_models_to_grammar.py

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

reason-act.sh

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

regex_to_grammar.py

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

server_embd.py

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

server-llama2-13B.sh

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

ts-type-to-grammar.sh

JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555 )

2024-04-12 19:43:38 +01:00