mirror of
https://github.com/theroyallab/tabbyAPI.git
synced 2026-05-11 16:30:16 +00:00
- move tool config from template_vars to separate yml config - new per-gen stream collector used for both streaming and non-streaming requests to ensure logic is consistent for both - move responsibility for switching between phases to stream collector - collect tool calls during streaming and parse at the end of each gen - prevent streaming empty content spans (be nice to clients) - correctly aggregate usage stats for n>1 requests, always emit with last chunk in last gen to finish - collect logprobs in model wrapper and correctly handle logprobs for multi-token chars etc. - respect top_logprobs argument in request - handle a number of edge cases like <think> tag being part of held string, etc. - retain tool parsing and inference-abort fixes from #413, apply similar fix to non-stream request as well Still TODO: - testing and validation with more models and tool schemas (tested on Qwen so far) - enable JSON constraint for JSON tool models - possibly some pydantification - documentation