server: exclude thinking tokens when finding the slot (#1079)

refactor find slot

enable by default

Fix load prompt

rename variables

Co-authored-by: firecoperana <firecoperana>
This commit is contained in:
firecoperana
2025-12-22 02:46:45 -06:00
committed by GitHub
parent 21fc9322f9
commit 5562605076
8 changed files with 247 additions and 33 deletions

View File

@@ -177,6 +177,7 @@ struct server_prompt {
server_tokens tokens;
int n_kept_prompt;
int n_discarded_prompt;
thinking_tokens think_tokens;
std::vector<uint8_t> data;