8 Commits

Author SHA1 Message Date
turboderp
3e3d7ccd54 Tools: Add step3_5 alias (qwen3_coder tool format) 2026-04-18 19:55:34 +02:00
turboderp
27d29209c6 Tools: Add Gemma4 parser 2026-04-10 00:16:58 +02:00
turboderp
db9048e59b Docs: Tool calling 2026-04-08 19:39:42 +02:00
turboderp
0409064028 Tools: Refactor and further simplify tool parsing
- remove ToolConfig, reduce to a single `tool_format` argument and hard-code extra args like start/end tokens
- dispatch to short, self-contained (and probably easily vibe coded) parser for each model type
- remove autodetection (seems infeasible since parsing effectively starts during streaming, and there is overlap between tool formats for different models)
- streamline xml parser and dedicate to qwen3_coder models
- add parsers for glm4.x, minimax-m2.x and mistral (seems shaky, probably because mistralai don't validate against hf)
- update docs
2026-04-01 00:07:44 +02:00
turboderp
179479199b Rework tool calls and OAI chat completions
- move tool config from template_vars to separate yml config
- new per-gen stream collector used for both streaming and non-streaming requests to ensure logic is consistent for both
- move responsibility for switching between phases to stream collector
- collect tool calls during streaming and parse at the end of each gen
- prevent streaming empty content spans (be nice to clients)
- correctly aggregate usage stats for n>1 requests, always emit with last chunk in last gen to finish
- collect logprobs in model wrapper and correctly handle logprobs for multi-token chars etc.
- respect top_logprobs argument in request
- handle a number of edge cases like <think> tag being part of held string, etc.
- retain tool parsing and inference-abort fixes from #413, apply similar fix to non-stream request as well

Still TODO:
- testing and validation with more models and tool schemas (tested on Qwen so far)
- enable JSON constraint for JSON tool models
- possibly some pydantification
- documentation
2026-03-30 00:22:55 +02:00
kingbri
1c3f84151f Docs: Update tool calling
For new variables and format.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-07-05 21:43:04 -04:00
Andrew Phillips
436ce752da Support more common tool variables in templates (tools, message.tool_calls) (#308)
* Add non-JSON version of `tools` and `functions` to `template_vars`.

Increase the compatibility with VLLM templates which use a non-JSON tools object.

* Add list of tool template variables to the documentation

* Use Jinja templates to provide `tools_json` and `functions_json`

This should be functionally equivelant, but the JSON won't be produced
unless it's needed.

* Make message.tool_calls match the JSON from ToolCallProcessor

* Log something when generating tool calls

* Add template for Qwen QwQ 32b

* Only log if tool calls have been detected

* API: Fix tool call variable assignments

Jinja functions do not run when variables are called. Use json.dumps
instead. In addition, log the request ID when stating that a tool
call was fired.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>

* Add `ToolCallProcessor.dump()` to get the list of processed dicts

* Remove qwen_qwq_32b.jinja

This will be added to the following repository at a later date:
https://github.com/theroyallab/llm-prompt-templates

---------

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
Co-authored-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-03-23 13:23:00 -04:00
kingbri
5614b342a7 Tree: Migrate docs into repository
This will auto-publish to the Github wiki via an action.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-02-17 23:39:35 -05:00