server: improve speed of speculative decoding (#1119)

* server: improve speed of speculative decoding change logs rpc: add recompute spec dec fix * Fix n_batch_size not set to context size for draft model --------- Co-authored-by: firecoperana <firecoperana>
2026-03-06 20:10:08 +00:00 · 2026-01-10 00:01:22 -06:00
parent 6695c6c945
commit c1931663ad
7 changed files with 164 additions and 135 deletions
--- a/examples/server/server-common.h
+++ b/examples/server/server-common.h
@@ -336,6 +336,10 @@ public:

    llama_pos pos_next() const;

+    int n_tokens() const {
+        return tokens.size();
+    }
+
    // for debugging
    std::string str() const;