tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-04-21 23:09:13 +00:00

Author	SHA1	Message	Date
turboderp	6aa842a1b2	Dependencies: Update exllamav3	2026-04-20 23:11:30 +02:00
turboderp	3e3d7ccd54	Tools: Add step3_5 alias (qwen3_coder tool format)	2026-04-18 19:55:34 +02:00
turboderp	ed41c51909	API: Prevent race condition when multiple chat requests try to inline-load the same model	2026-04-18 19:55:34 +02:00
turboderp	5b2b707af9	exllamav3: Account for bsz=2 in autosplit	2026-04-18 19:55:34 +02:00
turboderp	9ebbe06f29	exllamav3: Supply max_chunk_size when loading model	2026-04-18 13:20:12 +02:00
turboderp	f74f16a5c2	Config: Make recurrent cache size configurable	2026-04-17 02:40:22 +02:00
turboderp	bd589272cc	Config: Make cuda_malloc_async configurable again, change import order to make sure config is loaded before torch is imported	2026-04-17 02:39:16 +02:00
turboderp	32eed618dc	Dependencies: Add requests	2026-04-12 13:51:54 +02:00
turboderp	1a4896ce66	Tree: Format	2026-04-12 13:47:05 +02:00
turboderp	510bf7bf6c	Update README.md	2026-04-12 13:44:26 +02:00
turboderp	f1a2416da5	OAI endpoints: Add option to suppress header after reasoning start token (e.g. Gemma4's "thought\n")	2026-04-12 04:12:53 +02:00
turboderp	2636b445f0	Tree: Format	2026-04-12 03:33:14 +02:00
turboderp	bb64f8f18e	Dependencies: Update exllamav3	2026-04-12 03:31:54 +02:00
turboderp	3a42c1756c	ExLlamaV2: Use new disconnect handler	2026-04-10 22:04:21 +02:00
turboderp	b21100f971	ExLlamaV3: Fix disconnected request handling regression	2026-04-10 22:03:19 +02:00
mindkrypted	08f92167de	Tools: Updated/fixed Gemma4 tool parser	2026-04-10 22:02:34 +02:00
turboderp	5517cb5b9e	Templates: Revert add_bos_token fix	2026-04-10 03:53:58 +02:00
turboderp	7fedc179f0	Templates: Make sure add_bos_token=False is respected	2026-04-10 03:14:29 +02:00
turboderp	27d29209c6	Tools: Add Gemma4 parser	2026-04-10 00:16:58 +02:00
turboderp	55124d0fc6	Config: Add force_enable_thinking	2026-04-10 00:16:40 +02:00
turboderp	db9048e59b	Docs: Tool calling	2026-04-08 19:39:42 +02:00
turboderp	79d581e1f5	OAI endpoints: More rework - remove disconnect_task - move disconnect logic to a per-request handler that wraps cleanup operation and directly polls the request state with throttling - exclusively signal disconnect with CancelledError - rework completions endpoint to follow same approach as chat completions, share some code - refactor OAI endpoints a bit - correct behavior for batched completion requests - make sure logprobs work for completion and streaming completion requests - more tests	2026-04-02 01:26:44 +02:00
turboderp	c315f6b73e	OAI endpoints: Correctly propagate exceptions in non-streaming mode	2026-04-01 12:27:07 +02:00
turboderp	455c09932f	OAI endpoints: Fix regression for non-reasoning models	2026-04-01 00:08:39 +02:00
turboderp	0409064028	Tools: Refactor and further simplify tool parsing - remove ToolConfig, reduce to a single `tool_format` argument and hard-code extra args like start/end tokens - dispatch to short, self-contained (and probably easily vibe coded) parser for each model type - remove autodetection (seems infeasible since parsing effectively starts during streaming, and there is overlap between tool formats for different models) - streamline xml parser and dedicate to qwen3_coder models - add parsers for glm4.x, minimax-m2.x and mistral (seems shaky, probably because mistralai don't validate against hf) - update docs	2026-04-01 00:07:44 +02:00
turboderp	b6428b1676	Seq: Allow longer strings in log	2026-03-31 18:18:07 +02:00
turboderp	112ab69002	Fix comments	2026-03-31 14:43:55 +02:00
turboderp	bc66ba4b8b	Merge branch 'main' into main_tools	2026-03-30 23:07:53 +02:00
turboderp	c887ae88fc	Dependencies: Update exllamav3	2026-03-30 23:07:00 +02:00
turboderp	a7c7934ec3	Tool parsing: Include outer <tool_call> tags in raw text sent to parser	2026-03-30 04:05:15 +02:00
turboderp	41ed1e4881	Seq: Sanitize extra log data	2026-03-30 03:36:30 +02:00
turboderp	02a700e065	ExLlamaV3: Limit MMEmbedding cache size	2026-03-30 03:35:46 +02:00
turboderp	ba4309b948	ExLlamaV3: Replace MMEmbedding lru_cache with dict to avoid storing arbitrarily large uuencoded images as keys	2026-03-30 02:55:21 +02:00
turboderp	a035bc9e94	Model: Fix regression	2026-03-30 02:37:27 +02:00
turboderp	9ee5ded218	OAI: Log raw requests	2026-03-30 01:23:16 +02:00
turboderp	357eebffd2	Logger: Fix invalid escape sequence (gave syntax warning)	2026-03-30 00:33:01 +02:00
turboderp	9f565562dd	Add inference test scripts	2026-03-30 00:23:25 +02:00
turboderp	179479199b	Rework tool calls and OAI chat completions - move tool config from template_vars to separate yml config - new per-gen stream collector used for both streaming and non-streaming requests to ensure logic is consistent for both - move responsibility for switching between phases to stream collector - collect tool calls during streaming and parse at the end of each gen - prevent streaming empty content spans (be nice to clients) - correctly aggregate usage stats for n>1 requests, always emit with last chunk in last gen to finish - collect logprobs in model wrapper and correctly handle logprobs for multi-token chars etc. - respect top_logprobs argument in request - handle a number of edge cases like <think> tag being part of held string, etc. - retain tool parsing and inference-abort fixes from #413, apply similar fix to non-stream request as well Still TODO: - testing and validation with more models and tool schemas (tested on Qwen so far) - enable JSON constraint for JSON tool models - possibly some pydantification - documentation	2026-03-30 00:22:55 +02:00
turboderp	aa54098f26	Ruff: Format (line length)	2026-03-30 00:19:07 +02:00
turboderp	2a1503b283	Logging: Use debug level for Seq instead of verbose	2026-03-29 18:51:57 +02:00
turboderp	47d08729ed	Ruff: Raise line length limit to 100	2026-03-28 19:49:17 +01:00
turboderp	4b3c74782d	Fix bad merge	2026-03-28 12:47:26 +01:00
turboderp	b4dfd2e86f	Fix logging	2026-03-28 01:13:23 +01:00
turboderp	56378b946d	Merge branch 'fork/devnen/full-tool-calling-support' into main_seqlog # Conflicts: # common/templating.py # endpoints/OAI/utils/chat_completion.py # endpoints/OAI/utils/tools.py	2026-03-28 01:06:54 +01:00
turboderp	f3787de6a6	Ruff: Format	2026-03-27 21:47:24 +01:00
turboderp	83127ab4f8	Logging: Log messages via Seq wrapper	2026-03-27 21:38:47 +01:00
turboderp	c32a628917	Logging: Add Seq wrapper	2026-03-27 21:38:47 +01:00
turboderp	1a7191702d	Dependencies: Update exllamav3	2026-03-27 02:54:42 +01:00
turboderp	da3d3338e8	Logging: Fix env var parsing, formatting	2026-03-27 02:31:36 +01:00
turboderp	a3eabecf39	Logging: Add TABBY_LOG_CONSOLE_WIDTH to enable wider console log	2026-03-27 01:30:13 +01:00

1 2 3 4 5 ...

1194 Commits