tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-04-19 22:08:59 +00:00

Author	SHA1	Message	Date
turboderp	3e3d7ccd54	Tools: Add step3_5 alias (qwen3_coder tool format)	2026-04-18 19:55:34 +02:00
turboderp	ed41c51909	API: Prevent race condition when multiple chat requests try to inline-load the same model	2026-04-18 19:55:34 +02:00
turboderp	1a4896ce66	Tree: Format	2026-04-12 13:47:05 +02:00
turboderp	f1a2416da5	OAI endpoints: Add option to suppress header after reasoning start token (e.g. Gemma4's "thought\n")	2026-04-12 04:12:53 +02:00
turboderp	2636b445f0	Tree: Format	2026-04-12 03:33:14 +02:00
turboderp	b21100f971	ExLlamaV3: Fix disconnected request handling regression	2026-04-10 22:03:19 +02:00
mindkrypted	08f92167de	Tools: Updated/fixed Gemma4 tool parser	2026-04-10 22:02:34 +02:00
turboderp	5517cb5b9e	Templates: Revert add_bos_token fix	2026-04-10 03:53:58 +02:00
turboderp	7fedc179f0	Templates: Make sure add_bos_token=False is respected	2026-04-10 03:14:29 +02:00
turboderp	27d29209c6	Tools: Add Gemma4 parser	2026-04-10 00:16:58 +02:00
turboderp	55124d0fc6	Config: Add force_enable_thinking	2026-04-10 00:16:40 +02:00
turboderp	79d581e1f5	OAI endpoints: More rework - remove disconnect_task - move disconnect logic to a per-request handler that wraps cleanup operation and directly polls the request state with throttling - exclusively signal disconnect with CancelledError - rework completions endpoint to follow same approach as chat completions, share some code - refactor OAI endpoints a bit - correct behavior for batched completion requests - make sure logprobs work for completion and streaming completion requests - more tests	2026-04-02 01:26:44 +02:00
turboderp	c315f6b73e	OAI endpoints: Correctly propagate exceptions in non-streaming mode	2026-04-01 12:27:07 +02:00
turboderp	455c09932f	OAI endpoints: Fix regression for non-reasoning models	2026-04-01 00:08:39 +02:00
turboderp	0409064028	Tools: Refactor and further simplify tool parsing - remove ToolConfig, reduce to a single `tool_format` argument and hard-code extra args like start/end tokens - dispatch to short, self-contained (and probably easily vibe coded) parser for each model type - remove autodetection (seems infeasible since parsing effectively starts during streaming, and there is overlap between tool formats for different models) - streamline xml parser and dedicate to qwen3_coder models - add parsers for glm4.x, minimax-m2.x and mistral (seems shaky, probably because mistralai don't validate against hf) - update docs	2026-04-01 00:07:44 +02:00
turboderp	112ab69002	Fix comments	2026-03-31 14:43:55 +02:00
turboderp	a7c7934ec3	Tool parsing: Include outer <tool_call> tags in raw text sent to parser	2026-03-30 04:05:15 +02:00
turboderp	02a700e065	ExLlamaV3: Limit MMEmbedding cache size	2026-03-30 03:35:46 +02:00
turboderp	9ee5ded218	OAI: Log raw requests	2026-03-30 01:23:16 +02:00
turboderp	179479199b	Rework tool calls and OAI chat completions - move tool config from template_vars to separate yml config - new per-gen stream collector used for both streaming and non-streaming requests to ensure logic is consistent for both - move responsibility for switching between phases to stream collector - collect tool calls during streaming and parse at the end of each gen - prevent streaming empty content spans (be nice to clients) - correctly aggregate usage stats for n>1 requests, always emit with last chunk in last gen to finish - collect logprobs in model wrapper and correctly handle logprobs for multi-token chars etc. - respect top_logprobs argument in request - handle a number of edge cases like <think> tag being part of held string, etc. - retain tool parsing and inference-abort fixes from #413, apply similar fix to non-stream request as well Still TODO: - testing and validation with more models and tool schemas (tested on Qwen so far) - enable JSON constraint for JSON tool models - possibly some pydantification - documentation	2026-03-30 00:22:55 +02:00
turboderp	aa54098f26	Ruff: Format (line length)	2026-03-30 00:19:07 +02:00
turboderp	2a1503b283	Logging: Use debug level for Seq instead of verbose	2026-03-29 18:51:57 +02:00
turboderp	4b3c74782d	Fix bad merge	2026-03-28 12:47:26 +01:00
turboderp	b4dfd2e86f	Fix logging	2026-03-28 01:13:23 +01:00
turboderp	56378b946d	Merge branch 'fork/devnen/full-tool-calling-support' into main_seqlog # Conflicts: # common/templating.py # endpoints/OAI/utils/chat_completion.py # endpoints/OAI/utils/tools.py	2026-03-28 01:06:54 +01:00
turboderp	f3787de6a6	Ruff: Format	2026-03-27 21:47:24 +01:00
turboderp	83127ab4f8	Logging: Log messages via Seq wrapper	2026-03-27 21:38:47 +01:00
turboderp	40aa82da28	API: More robust test for whether generation starts in reasoning mode	2026-03-27 01:29:17 +01:00
turboderp	0d1a8ba784	API: Try to guess whether streaming response should start with content or reasoning_content	2026-03-21 01:11:01 +01:00
turboderp	0d577b8121	Cleanup and formatting	2026-03-20 01:27:29 +01:00
turboderp	6bccc70d94	Tree: Formatting	2026-03-18 03:29:15 +01:00
turboderp	d2117a7c3b	Config: Pass reasoning settings in kwargs, allow for overrides via tabby_config.yml	2026-03-18 00:24:22 +01:00
turboderp	8eb6c65008	Merge branch 'main' into fork/Orion-zhen/feat_reasoning # Conflicts: # config_sample.yml	2026-03-17 23:05:19 +01:00
devnen	a2c7d81686	Broader model compatibility, tool_choice support, bug fixes and cleanup	2026-02-14 16:19:59 +01:00
devnen	87bbe0fac2	Full tool-calling support: XML parsing, streaming compliance, Pydantic fix, inference abort fix	2026-02-14 14:26:57 +01:00
turboderp	d672dc2137	API: Fix race condition when client disconnects	2025-10-05 21:23:02 +02:00
kingbri	0b4ca567f8	API: Persist request IDs and append full_text to finish chunk Adding these to each generation chunk helps remove redundancy and unecessary request ID operations. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-25 12:27:44 -04:00
kingbri	707d005aad	API: Default tool call ID and type Doing this helps reduce the model's burden of generating the tool call ID and type (which is always "function"). Follow mistral's spec for tool call IDs by using a 9 character alphanumeric string. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-11 01:11:09 -04:00
kingbri	5b1db3ad83	API: Don't do a second re-render when tool calling Re-rendering the template is an expensive operation when it's possible to just concatenate the prompt and current generation text together. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-06 11:32:36 -04:00
kingbri	3dfa965019	API: Add tool_call_id for role = tool If a message with role = tool is present, the tool_call_id should also be given to the template. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-05 21:52:58 -04:00
kingbri	879f4cee7e	API: Modify tool calling for wider compat When revisiting tool calls, the formats have more or less become standard. For greater compatibility with templates, primarily use the message.tools parameter and remove the extra custom metadata that is no longer required. However, unlike other backends, tabbyAPI still uses template metadata to declare what the tool start string is. This allows for template-level customization along with giving more power to the user while the server exists to consume rather than work on a case-by-case basis. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-05 14:28:12 -04:00
kingbri	b6a26da50c	API: Fix tool call serialization To render in the template, tool call start tokens needed to have less checks and remove the line to convert message.tool_calls to a dict since that breaks the rest of the chain by disconnecting the types. model_dump on the message itself already accomplishes this. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-04 15:02:49 -04:00
kingbri	2913ce29fc	API: Add timings to usage stats It's useful for the client to know what the T/s and total time for generation are per-request. Works with both completions and chat completions. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-17 22:54:51 -04:00
kingbri	2d89c96879	API: Re-add BOS token stripping in template render Matching YALS, if the model has add_bos_token enabled, then remove an extra BOS token at the start of the prompt. This usually happens with misconfigured templates such as Llama 3. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-24 21:11:53 -04:00
kingbri	10fbe043a4	API: Fix typing for chat templates in CC requests Tools must be None by default. Chat completion message content can be None, a string, or a list, so default to None. Exclude all None values from a CC message since the template can say the variable "exists" despite being None, causing an error. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-24 21:06:05 -04:00
kingbri	54b8a20a19	API: Fix types for chat completions Messages were mistakenly being sent as Pydantic objects, but templates expect dictionaries. Properly convert these before render. In addition, initialize all Optional lists as an empty list since this will cause the least problems when interacting with other parts of API code, such as templates. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 18:10:34 -04:00
kingbri	0858b6d4b2	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 00:46:40 -04:00
kingbri	7900b72848	API: Add chat_template_kwargs alias for template_vars This key is used in VLLM and SGLang. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 15:48:39 -04:00
Brian	b555eeb6e7	Merge pull request #339 from Maaaxiii/fix/tool-calling-embeddings fix: Aligned Parameter Name in chat completions generate_tool_calls	2025-05-11 20:41:58 -04:00
kingbri	6379081dd8	Sampling: Make add_bos_token override concise Also set the default to None so text completions follows the same pattern. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-10 19:07:35 -04:00

1 2 3 4

184 Commits