mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-05-12 17:05:57 +00:00
* wip: separate llama_context for MTP with graph reuse * wip: fix KV cache desync with separate MTP context * refactor: remove dead mtp logic code, encapsulate KV mirroring * mtp-context: derive args directly from the main model's context * mtp: fix kv cache positions * clean small comments * minor refactor for context shift
163 KiB
163 KiB