Commit Graph

2 Commits

Author SHA1 Message Date
Samuel Oliveira Alves
ea94afe777 Speculative checkpoints for recurrent models (#1669)
* server: spec checkpoints for recurrent models

* fix: save/restore sampler state during speculative checkpoint

When speculative decoding rejects draft tokens and restores the
recurrent state checkpoint, the sampler (RNG, grammar, prev tokens)
must also be restored to maintain consistency. Without this, the
sampler state reflects the rejected draft tokens, leading to
potential divergence.

Uses common_sampler_clone() to snapshot the sampler before the
speculative batch decode, and restores it on rejection.

* server: snapshot recurrent state in tensor

* reset ngram mod state for rejected tokens

* server: refactor checkpoint state logic

* speculative: fix sampler for checkpoints

* recurrent model: implement recurrent kernel checkpoint

* recurrent model: refactor api

* spec: free rbudget before overwriting
2026-04-24 09:59:30 +02:00
firecoperana
1cb7e1bf39 spec : add self speculative decoding, ngram and refactor (#1261)
* spec : add self speculative decoding and ngram-mod and refactor

common : use common_ prefix for common library function

llama : use LLAMA_TOKEN_NULL

spec : add self speculative decoding (no draft model required) + refactor

spec : add ngram-mod

spec : various improvements ton ngram-map + docs

spec : fix the check-rate logic of ngram-simple

common : add common_speculative_is_compat()

spec : simplify time measurement using common_time_meas

refactor common_sampler_init

refactor common_token_to_piece

refactor and fix cur_p bug

clean up

* spec : remove check rate

* spec: show warnings instead of abort

---------

Co-authored-by: firecoperana <firecoperana>
Co-authored-by: Sascha Rogmann <59577610+srogmann@users.noreply.github.com>
2026-02-13 19:04:55 +01:00