server: improve speed of speculative decoding (#1119)

* server: improve speed of speculative decoding

change logs

rpc: add recompute

spec dec fix

* Fix n_batch_size not set to context size for draft model

---------

Co-authored-by: firecoperana <firecoperana>
This commit is contained in:
firecoperana
2026-01-10 00:01:22 -06:00
committed by GitHub
parent 6695c6c945
commit c1931663ad
7 changed files with 164 additions and 135 deletions

View File

@@ -484,7 +484,7 @@ bool server_sent_event(httplib::DataSink& sink, const json& data) {
data.dump(-1, ' ', false, json::error_handler_t::replace) +
"\n\n"; // required by RFC 8895 - A message is terminated by a blank line (two line terminators in a row).
LOG_VERBOSE("data stream, to_send: %s", str.c_str());
//LOG_VERBOSE("data stream, to_send: %s", str.c_str());
return sink.write(str.c_str(), str.size());
}