Commit Graph

15 Commits

Author SHA1 Message Date
turboderp
27d29209c6 Tools: Add Gemma4 parser 2026-04-10 00:16:58 +02:00
turboderp
79d581e1f5 OAI endpoints: More rework
- remove disconnect_task
- move disconnect logic to a per-request handler that wraps cleanup operation and directly polls the request state with throttling
- exclusively signal disconnect with CancelledError
- rework completions endpoint to follow same approach as chat completions, share some code
- refactor OAI endpoints a bit
- correct behavior for batched completion requests
- make sure logprobs work for completion and streaming completion requests
- more tests
2026-04-02 01:26:44 +02:00
turboderp
0409064028 Tools: Refactor and further simplify tool parsing
- remove ToolConfig, reduce to a single `tool_format` argument and hard-code extra args like start/end tokens
- dispatch to short, self-contained (and probably easily vibe coded) parser for each model type
- remove autodetection (seems infeasible since parsing effectively starts during streaming, and there is overlap between tool formats for different models)
- streamline xml parser and dedicate to qwen3_coder models
- add parsers for glm4.x, minimax-m2.x and mistral (seems shaky, probably because mistralai don't validate against hf)
- update docs
2026-04-01 00:07:44 +02:00
turboderp
179479199b Rework tool calls and OAI chat completions
- move tool config from template_vars to separate yml config
- new per-gen stream collector used for both streaming and non-streaming requests to ensure logic is consistent for both
- move responsibility for switching between phases to stream collector
- collect tool calls during streaming and parse at the end of each gen
- prevent streaming empty content spans (be nice to clients)
- correctly aggregate usage stats for n>1 requests, always emit with last chunk in last gen to finish
- collect logprobs in model wrapper and correctly handle logprobs for multi-token chars etc.
- respect top_logprobs argument in request
- handle a number of edge cases like <think> tag being part of held string, etc.
- retain tool parsing and inference-abort fixes from #413, apply similar fix to non-stream request as well

Still TODO:
- testing and validation with more models and tool schemas (tested on Qwen so far)
- enable JSON constraint for JSON tool models
- possibly some pydantification
- documentation
2026-03-30 00:22:55 +02:00
turboderp
aa54098f26 Ruff: Format (line length) 2026-03-30 00:19:07 +02:00
turboderp
b4dfd2e86f Fix logging 2026-03-28 01:13:23 +01:00
turboderp
56378b946d Merge branch 'fork/devnen/full-tool-calling-support' into main_seqlog
# Conflicts:
#	common/templating.py
#	endpoints/OAI/utils/chat_completion.py
#	endpoints/OAI/utils/tools.py
2026-03-28 01:06:54 +01:00
turboderp
f3787de6a6 Ruff: Format 2026-03-27 21:47:24 +01:00
turboderp
83127ab4f8 Logging: Log messages via Seq wrapper 2026-03-27 21:38:47 +01:00
devnen
a2c7d81686 Broader model compatibility, tool_choice support, bug fixes and cleanup 2026-02-14 16:19:59 +01:00
devnen
87bbe0fac2 Full tool-calling support: XML parsing, streaming compliance, Pydantic fix, inference abort fix 2026-02-14 14:26:57 +01:00
kingbri
707d005aad API: Default tool call ID and type
Doing this helps reduce the model's burden of generating the tool
call ID and type (which is always "function"). Follow mistral's spec
for tool call IDs by using a 9 character alphanumeric string.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-07-11 01:11:09 -04:00
kingbri
879f4cee7e API: Modify tool calling for wider compat
When revisiting tool calls, the formats have more or less become standard.
For greater compatibility with templates, primarily use the message.tools
parameter and remove the extra custom metadata that is no longer required.

However, unlike other backends, tabbyAPI still uses template metadata
to declare what the tool start string is. This allows for template-level
customization along with giving more power to the user while the server
exists to consume rather than work on a case-by-case basis.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-07-05 14:28:12 -04:00
Andrew Phillips
436ce752da Support more common tool variables in templates (tools, message.tool_calls) (#308)
* Add non-JSON version of `tools` and `functions` to `template_vars`.

Increase the compatibility with VLLM templates which use a non-JSON tools object.

* Add list of tool template variables to the documentation

* Use Jinja templates to provide `tools_json` and `functions_json`

This should be functionally equivelant, but the JSON won't be produced
unless it's needed.

* Make message.tool_calls match the JSON from ToolCallProcessor

* Log something when generating tool calls

* Add template for Qwen QwQ 32b

* Only log if tool calls have been detected

* API: Fix tool call variable assignments

Jinja functions do not run when variables are called. Use json.dumps
instead. In addition, log the request ID when stating that a tool
call was fired.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>

* Add `ToolCallProcessor.dump()` to get the list of processed dicts

* Remove qwen_qwq_32b.jinja

This will be added to the following repository at a later date:
https://github.com/theroyallab/llm-prompt-templates

---------

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
Co-authored-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-03-23 13:23:00 -04:00
kingbri
d98c0bd3f6 API: Add tools class
Was mistakenly not added in PR 302.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-03-14 15:07:11 -04:00