mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-03-04 19:10:03 +00:00
server: improve speed of speculative decoding (#1119)
* server: improve speed of speculative decoding change logs rpc: add recompute spec dec fix * Fix n_batch_size not set to context size for draft model --------- Co-authored-by: firecoperana <firecoperana>
This commit is contained in:
@@ -336,6 +336,10 @@ public:
|
||||
|
||||
llama_pos pos_next() const;
|
||||
|
||||
int n_tokens() const {
|
||||
return tokens.size();
|
||||
}
|
||||
|
||||
// for debugging
|
||||
std::string str() const;
|
||||
|
||||
|
||||
Reference in New Issue
Block a user