Files
ik_llama.cpp/examples/server/server-context.cpp
Samuel Oliveira Alves 67e6346225 Support for Qwen 3.5 MTP (dense models only) (#1698)
* qwen-mtp: add dense mtp for one draft

* add support for smaller qwen mtp commit

* qwen-mtp: fix graph for qwen dense variants

* Squashed commit of the following:

commit a92a154b38c7fddc84460f8852c900f8d6ce907e
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date:   Mon Apr 20 13:30:21 2026 -0300

    recurrent model: refactor api

commit dfac8f19f6
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date:   Mon Apr 20 12:22:29 2026 -0300

    recurrent model: implement recurrent kernel checkpoint

commit 9c44b117f9
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date:   Sat Apr 18 11:52:39 2026 -0300

    speculative: fix sampler for checkpoints

commit e7006393bc
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date:   Fri Apr 17 14:08:25 2026 -0300

    server: refactor checkpoint state logic

commit 57eabf04df
Merge: dc4797b7 64234e3c
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date:   Fri Apr 17 13:53:41 2026 -0300

    Merge branch 'main' into fix/hybrid-cache-speculative

commit dc4797b723
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date:   Fri Apr 17 13:12:40 2026 -0300

    reset ngram mod state for rejected tokens

commit 8ff2d943a3
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date:   Fri Apr 17 13:08:04 2026 -0300

    server: snapshot recurrent state in tensor

commit d93dfb5e6b
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date:   Thu Apr 16 22:36:37 2026 -0300

    fix: save/restore sampler state during speculative checkpoint

    When speculative decoding rejects draft tokens and restores the
    recurrent state checkpoint, the sampler (RNG, grammar, prev tokens)
    must also be restored to maintain consistency. Without this, the
    sampler state reflects the rejected draft tokens, leading to
    potential divergence.

    Uses common_sampler_clone() to snapshot the sampler before the
    speculative batch decode, and restores it on rejection.

commit d670cf85cd
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date:   Thu Apr 16 21:53:52 2026 -0300

    server: spec checkpoints for recurrent models

* server: fix leak context between requests

* qwen3: allow mtp to run with split graph

* qwen3 mtp: selects rows before the ffn
2026-04-28 07:47:50 +02:00

182 KiB