mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-05-12 00:50:22 +00:00

Files

Alex 51331f4973 Fix two speculative-decoding crashes that prevent any usage (#1760 )

This patch addresses two latent bugs in examples/speculative/speculative.cpp
that prevent llama-speculative.exe from running on greedy sampling
(temp=0) or producing rejection-sampling output (temp>0):

1. Line 191: `params.sparams.grammar = { COMMON_GRAMMAR_TYPE_NONE, "" };`
   invokes `common_grammar(type, grammar)` which asserts
   `type != NONE || !grammar.empty()`. Both conditions fail with the
   intended-to-be-empty grammar, so every speculative run hits a hard
   `GGML_ASSERT` in common/sampling.h:63 immediately after model load.

   Fix: default-construct via `common_grammar{}` to bypass the
   field-init constructor.

2. Lines 293-294: `GGML_ASSERT(dist_tgt.sorted)` and
   `GGML_ASSERT(dist_dft.sorted)` fire whenever the draft sampler does
   not set the .sorted flag (which is most modern sampler paths).
   Comment them out — the next ~10 lines re-sort both distributions
   by id explicitly, so the assertion is incorrect anyway.

   Fix: replace the asserts with an explanatory comment.

After both fixes, `llama-speculative.exe` runs to completion. The
acceptance-rate measurement at temp=0 still looks suspicious (0%
across same-family draft/target pairs), but that is a different
issue out of scope for this PR.

Tested on Qwen3-0.6B-IQ4_XS drafting Qwen3-1.7B-IQ4_XS, both base
models from `bartowski/Qwen_Qwen3-*-GGUF` on Windows + ik_llama.cpp
build at HEAD of windows-mingw-default-win10 (which is itself a
follow-up to PR #1755).

2026-05-09 08:36:38 +03:00

CMakeLists.txt

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

README.md

speculative : implement stochastic speculative sampling (#5625 )

2024-03-04 20:24:00 +02:00

speculative.cpp

Fix two speculative-decoding crashes that prevent any usage (#1760 )

2026-05-09 08:36:38 +03:00

README.md

llama.cpp/examples/speculative

Demonstration of speculative decoding and tree-based speculative decoding techniques

More info: