tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-05-24 06:34:33 +00:00

Author	SHA1	Message	Date
turboderp	41ed1e4881	Seq: Sanitize extra log data	2026-03-30 03:36:30 +02:00
turboderp	02a700e065	ExLlamaV3: Limit MMEmbedding cache size	2026-03-30 03:35:46 +02:00
turboderp	ba4309b948	ExLlamaV3: Replace MMEmbedding lru_cache with dict to avoid storing arbitrarily large uuencoded images as keys	2026-03-30 02:55:21 +02:00
turboderp	a035bc9e94	Model: Fix regression	2026-03-30 02:37:27 +02:00
turboderp	9ee5ded218	OAI: Log raw requests	2026-03-30 01:23:16 +02:00
turboderp	357eebffd2	Logger: Fix invalid escape sequence (gave syntax warning)	2026-03-30 00:33:01 +02:00
turboderp	9f565562dd	Add inference test scripts	2026-03-30 00:23:25 +02:00
turboderp	179479199b	Rework tool calls and OAI chat completions - move tool config from template_vars to separate yml config - new per-gen stream collector used for both streaming and non-streaming requests to ensure logic is consistent for both - move responsibility for switching between phases to stream collector - collect tool calls during streaming and parse at the end of each gen - prevent streaming empty content spans (be nice to clients) - correctly aggregate usage stats for n>1 requests, always emit with last chunk in last gen to finish - collect logprobs in model wrapper and correctly handle logprobs for multi-token chars etc. - respect top_logprobs argument in request - handle a number of edge cases like <think> tag being part of held string, etc. - retain tool parsing and inference-abort fixes from #413, apply similar fix to non-stream request as well Still TODO: - testing and validation with more models and tool schemas (tested on Qwen so far) - enable JSON constraint for JSON tool models - possibly some pydantification - documentation	2026-03-30 00:22:55 +02:00
turboderp	aa54098f26	Ruff: Format (line length)	2026-03-30 00:19:07 +02:00
turboderp	2a1503b283	Logging: Use debug level for Seq instead of verbose	2026-03-29 18:51:57 +02:00
turboderp	47d08729ed	Ruff: Raise line length limit to 100	2026-03-28 19:49:17 +01:00
turboderp	4b3c74782d	Fix bad merge	2026-03-28 12:47:26 +01:00
turboderp	b4dfd2e86f	Fix logging	2026-03-28 01:13:23 +01:00
turboderp	56378b946d	Merge branch 'fork/devnen/full-tool-calling-support' into main_seqlog # Conflicts: # common/templating.py # endpoints/OAI/utils/chat_completion.py # endpoints/OAI/utils/tools.py	2026-03-28 01:06:54 +01:00
turboderp	f3787de6a6	Ruff: Format	2026-03-27 21:47:24 +01:00
turboderp	83127ab4f8	Logging: Log messages via Seq wrapper	2026-03-27 21:38:47 +01:00
turboderp	c32a628917	Logging: Add Seq wrapper	2026-03-27 21:38:47 +01:00
turboderp	1a7191702d	Dependencies: Update exllamav3	2026-03-27 02:54:42 +01:00
turboderp	da3d3338e8	Logging: Fix env var parsing, formatting	2026-03-27 02:31:36 +01:00
turboderp	a3eabecf39	Logging: Add TABBY_LOG_CONSOLE_WIDTH to enable wider console log	2026-03-27 01:30:13 +01:00
turboderp	40aa82da28	API: More robust test for whether generation starts in reasoning mode	2026-03-27 01:29:17 +01:00
turboderp	ffca853d4c	ExLlamaV3: Force minimum rep_decay of 1 token, pending update to backend	2026-03-22 14:51:08 +01:00
turboderp	92cb48c38d	ExLlamaV3: Fix regression in max_seq_len limit	2026-03-22 00:34:47 +01:00
turboderp	0d1a8ba784	API: Try to guess whether streaming response should start with content or reasoning_content	2026-03-21 01:11:01 +01:00
turboderp	803ca5c681	Tree: Format	2026-03-20 20:56:43 +01:00
turboderp	088e196cbc	ExLlamaV3: Change cache size fallback value to max_seq_len, add warning to configure manually	2026-03-20 20:42:14 +01:00
turboderp	8b1bfeaba7	Model: Make sure reasoning tokens are always defined	2026-03-20 20:41:44 +01:00
turboderp	78c5993c27	ExLlamaV3: Correctly report when vision is supported but not enabled	2026-03-20 01:33:38 +01:00
turboderp	0d577b8121	Cleanup and formatting	2026-03-20 01:27:29 +01:00
turboderp	6bccc70d94	Tree: Formatting	2026-03-18 03:29:15 +01:00
turboderp	53357047ef	Delete redundant test script	2026-03-18 00:24:49 +01:00
turboderp	d2117a7c3b	Config: Pass reasoning settings in kwargs, allow for overrides via tabby_config.yml	2026-03-18 00:24:22 +01:00
turboderp	8eb6c65008	Merge branch 'main' into fork/Orion-zhen/feat_reasoning # Conflicts: # config_sample.yml	2026-03-17 23:05:19 +01:00
turboderp	ccd171cefb	Dependencies: Update exllamav3	2026-03-17 03:01:22 +01:00
turboderp	c2452414e1	Model: Ignore inline load requests if the requested model is already loaded	2026-03-17 03:00:27 +01:00
turboderp	6bf3670372	Model: Correctly read max_position_embeddings in nested config Rework how max_seq_len is determined from user settings, model defaults and cache size constraint	2026-03-17 02:58:47 +01:00
turboderp	724060b058	Dependencies: Update exllamav3	2026-03-13 23:14:09 +01:00
turboderp	761e26a137	Dependencies: Update exllamav3	2026-03-05 18:09:34 +01:00
devnen	a2c7d81686	Broader model compatibility, tool_choice support, bug fixes and cleanup	2026-02-14 16:19:59 +01:00
devnen	87bbe0fac2	Full tool-calling support: XML parsing, streaming compliance, Pydantic fix, inference abort fix	2026-02-14 14:26:57 +01:00
turboderp	41511f56c6	Dependencies: Update exllamav3	2026-02-09 22:54:29 +01:00
turboderp	54e3ea1fb3	Tree: Format	2026-01-20 22:57:36 +01:00
turboderp	0985c7f7b7	Sampling: Add adaptive-P params	2026-01-20 19:09:54 +01:00
turboderp	8a824cb127	Dependencies: Update exllamav3	2026-01-20 18:52:44 +01:00
kingbri	84bb1ce9fd	Dependencies: Fix FA2 wheels Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-12-19 16:52:05 -05:00
kingbri	5627f4d69e	Dependencies: Update to torch 2.9 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-12-19 15:59:40 -05:00
turboderp	f04fc6eb25	Dependencies: Update exllamav3	2025-12-16 12:58:31 +01:00
Brian	55288e5a1f	Merge pull request #402 from AlpinDale/auto-select-gpu [startup] auto-select GPU backend	2025-12-08 22:04:26 -05:00
AlpinDale	76ffc7c458	[startup] auto-select GPU backend	2025-12-08 23:52:02 +00:00
turboderp	8b6b793bfc	Dependencies: Update exllamav3	2025-11-25 21:17:31 +01:00

1 2 3 4 5 ...

1164 Commits