tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-05-12 16:55:50 +00:00

Author	SHA1	Message	Date
turboderp	27d29209c6	Tools: Add Gemma4 parser	2026-04-10 00:16:58 +02:00
turboderp	79d581e1f5	OAI endpoints: More rework - remove disconnect_task - move disconnect logic to a per-request handler that wraps cleanup operation and directly polls the request state with throttling - exclusively signal disconnect with CancelledError - rework completions endpoint to follow same approach as chat completions, share some code - refactor OAI endpoints a bit - correct behavior for batched completion requests - make sure logprobs work for completion and streaming completion requests - more tests	2026-04-02 01:26:44 +02:00
turboderp	0409064028	Tools: Refactor and further simplify tool parsing - remove ToolConfig, reduce to a single `tool_format` argument and hard-code extra args like start/end tokens - dispatch to short, self-contained (and probably easily vibe coded) parser for each model type - remove autodetection (seems infeasible since parsing effectively starts during streaming, and there is overlap between tool formats for different models) - streamline xml parser and dedicate to qwen3_coder models - add parsers for glm4.x, minimax-m2.x and mistral (seems shaky, probably because mistralai don't validate against hf) - update docs	2026-04-01 00:07:44 +02:00
turboderp	179479199b	Rework tool calls and OAI chat completions - move tool config from template_vars to separate yml config - new per-gen stream collector used for both streaming and non-streaming requests to ensure logic is consistent for both - move responsibility for switching between phases to stream collector - collect tool calls during streaming and parse at the end of each gen - prevent streaming empty content spans (be nice to clients) - correctly aggregate usage stats for n>1 requests, always emit with last chunk in last gen to finish - collect logprobs in model wrapper and correctly handle logprobs for multi-token chars etc. - respect top_logprobs argument in request - handle a number of edge cases like <think> tag being part of held string, etc. - retain tool parsing and inference-abort fixes from #413, apply similar fix to non-stream request as well Still TODO: - testing and validation with more models and tool schemas (tested on Qwen so far) - enable JSON constraint for JSON tool models - possibly some pydantification - documentation	2026-03-30 00:22:55 +02:00
turboderp	aa54098f26	Ruff: Format (line length)	2026-03-30 00:19:07 +02:00
turboderp	b4dfd2e86f	Fix logging	2026-03-28 01:13:23 +01:00
turboderp	56378b946d	Merge branch 'fork/devnen/full-tool-calling-support' into main_seqlog # Conflicts: # common/templating.py # endpoints/OAI/utils/chat_completion.py # endpoints/OAI/utils/tools.py	2026-03-28 01:06:54 +01:00
turboderp	f3787de6a6	Ruff: Format	2026-03-27 21:47:24 +01:00
turboderp	83127ab4f8	Logging: Log messages via Seq wrapper	2026-03-27 21:38:47 +01:00
devnen	a2c7d81686	Broader model compatibility, tool_choice support, bug fixes and cleanup	2026-02-14 16:19:59 +01:00
devnen	87bbe0fac2	Full tool-calling support: XML parsing, streaming compliance, Pydantic fix, inference abort fix	2026-02-14 14:26:57 +01:00
kingbri	707d005aad	API: Default tool call ID and type Doing this helps reduce the model's burden of generating the tool call ID and type (which is always "function"). Follow mistral's spec for tool call IDs by using a 9 character alphanumeric string. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-11 01:11:09 -04:00
kingbri	879f4cee7e	API: Modify tool calling for wider compat When revisiting tool calls, the formats have more or less become standard. For greater compatibility with templates, primarily use the message.tools parameter and remove the extra custom metadata that is no longer required. However, unlike other backends, tabbyAPI still uses template metadata to declare what the tool start string is. This allows for template-level customization along with giving more power to the user while the server exists to consume rather than work on a case-by-case basis. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-05 14:28:12 -04:00
Andrew Phillips	436ce752da	Support more common tool variables in templates (tools, message.tool_calls) (#308 ) * Add non-JSON version of `tools` and `functions` to `template_vars`. Increase the compatibility with VLLM templates which use a non-JSON tools object. * Add list of tool template variables to the documentation * Use Jinja templates to provide `tools_json` and `functions_json` This should be functionally equivelant, but the JSON won't be produced unless it's needed. * Make message.tool_calls match the JSON from ToolCallProcessor * Log something when generating tool calls * Add template for Qwen QwQ 32b * Only log if tool calls have been detected * API: Fix tool call variable assignments Jinja functions do not run when variables are called. Use json.dumps instead. In addition, log the request ID when stating that a tool call was fired. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> * Add `ToolCallProcessor.dump()` to get the list of processed dicts * Remove qwen_qwq_32b.jinja This will be added to the following repository at a later date: https://github.com/theroyallab/llm-prompt-templates --------- Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> Co-authored-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-23 13:23:00 -04:00
kingbri	d98c0bd3f6	API: Add tools class Was mistakenly not added in PR 302. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-14 15:07:11 -04:00

15 Commits