tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-05-12 00:41:22 +00:00

Author	SHA1	Message	Date
turboderp	f1a2416da5	OAI endpoints: Add option to suppress header after reasoning start token (e.g. Gemma4's "thought\n")	2026-04-12 04:12:53 +02:00
turboderp	55124d0fc6	Config: Add force_enable_thinking	2026-04-10 00:16:40 +02:00
turboderp	0409064028	Tools: Refactor and further simplify tool parsing - remove ToolConfig, reduce to a single `tool_format` argument and hard-code extra args like start/end tokens - dispatch to short, self-contained (and probably easily vibe coded) parser for each model type - remove autodetection (seems infeasible since parsing effectively starts during streaming, and there is overlap between tool formats for different models) - streamline xml parser and dedicate to qwen3_coder models - add parsers for glm4.x, minimax-m2.x and mistral (seems shaky, probably because mistralai don't validate against hf) - update docs	2026-04-01 00:07:44 +02:00
turboderp	179479199b	Rework tool calls and OAI chat completions - move tool config from template_vars to separate yml config - new per-gen stream collector used for both streaming and non-streaming requests to ensure logic is consistent for both - move responsibility for switching between phases to stream collector - collect tool calls during streaming and parse at the end of each gen - prevent streaming empty content spans (be nice to clients) - correctly aggregate usage stats for n>1 requests, always emit with last chunk in last gen to finish - collect logprobs in model wrapper and correctly handle logprobs for multi-token chars etc. - respect top_logprobs argument in request - handle a number of edge cases like <think> tag being part of held string, etc. - retain tool parsing and inference-abort fixes from #413, apply similar fix to non-stream request as well Still TODO: - testing and validation with more models and tool schemas (tested on Qwen so far) - enable JSON constraint for JSON tool models - possibly some pydantification - documentation	2026-03-30 00:22:55 +02:00
turboderp	aa54098f26	Ruff: Format (line length)	2026-03-30 00:19:07 +02:00
turboderp	8b1bfeaba7	Model: Make sure reasoning tokens are always defined	2026-03-20 20:41:44 +01:00
turboderp	0d577b8121	Cleanup and formatting	2026-03-20 01:27:29 +01:00
kingbri	390daeb92f	Model: Create universal HFModel class The HFModel class serves to coalesce all config files that contain random keys which are required for model usage. Adding this base class allows us to expand as HuggingFace randomly changes their JSON schemas over time, reducing the brunt that backend devs need to feel when their next model isn't supported. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-13 18:12:38 -04:00
kingbri	0c1d794390	Model: Add exl3 and associated load functions Initial exl3 compat and loading functionality. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:39 -04:00
kingbri	242f6b7d2a	Model: Simplify add_bos_token handling Set add_bos_token to True by default in the tokenizer_config stub. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:28 -04:00
kingbri	aa657fa6e9	API: Ignore add_bos_token in chat completions When fetching special tokens from the model, don't factor in the add_bos_token and ban_eos_token parameters as switches. In addition, change the internal handling of add_bos_token to an optional boolean. This allows us to fallback to the model when selecting whether or not to add the BOS token, especially for chat completions. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-01 22:51:15 -04:00
kingbri	f070587e9f	Model: Add proper jobs cleanup and fix var calls Jobs should be started and immediately cleaned up when calling the generation stream. Expose a stream_generate function and append this to the base class since it's more idiomatic than generate_gen. The exl2 container's generate_gen function is now internal. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-24 21:30:55 -04:00
kingbri	f2c7da2faf	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-21 23:21:26 -04:00
kingbri	3f09fcd8c9	Model: Make model params return a model card The model card is a unified structure for sharing model params. Rather than kwargs, use this instead. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-21 23:15:46 -04:00
kingbri	034682fcf1	Backends: Add base model container Base class for all model containers. Used in the shared model file for interface. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-20 17:24:10 -04:00

15 Commits