ik_llama.cpp/examples/server/server-context.cpp at ea94afe777e8dbfbd7ffddd3b33966db68eca20b

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-05-12 08:56:28 +00:00

Files

Samuel Oliveira Alves ea94afe777 Speculative checkpoints for recurrent models (#1669 )

* server: spec checkpoints for recurrent models

* fix: save/restore sampler state during speculative checkpoint

When speculative decoding rejects draft tokens and restores the
recurrent state checkpoint, the sampler (RNG, grammar, prev tokens)
must also be restored to maintain consistency. Without this, the
sampler state reflects the rejected draft tokens, leading to
potential divergence.

Uses common_sampler_clone() to snapshot the sampler before the
speculative batch decode, and restores it on rejection.

* server: snapshot recurrent state in tensor

* reset ngram mod state for rejected tokens

* server: refactor checkpoint state logic

* speculative: fix sampler for checkpoints

* recurrent model: implement recurrent kernel checkpoint

* recurrent model: refactor api

* spec: free rbudget before overwriting

2026-04-24 09:59:30 +02:00

180 KiB

Raw Blame History

View Raw

180 KiB Raw Blame History

180 KiB

Raw Blame History