tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-07-17 09:07:42 +00:00

Author	SHA1	Message	Date
turboderp	a9d60fe714	Tools: Add Harmony (GPT-OSS) tool format and reasoning parser (include missing file)	2026-07-16 03:39:58 +02:00
turboderp	a35cdc963a	Tools: Add Harmony (GPT-OSS) tool format and reasoning parser	2026-07-16 01:49:26 +02:00
turboderp	2602767dd6	Docs: Add missing items	2026-07-16 01:46:25 +02:00
AlpinDale	725ac17756	[auth] multiple API keys support (#403 )	2026-07-16 00:18:42 +02:00
turboderp	a2dba97dd1	Auth: Allow multiple API keys, watch api_tokens.yml and reload keys on changes without restarting server, add tool to append new 128-bit tokens	2026-07-16 00:13:14 +02:00
turboderp	a74528968b	API: Account for continue_final_message when inferring reasoning mode	2026-07-15 23:27:04 +02:00
turboderp	7f6be58ac6	Merge remote-tracking branch 'origin/main'	2026-07-15 22:36:41 +02:00
Ethan Jones	5b39f08096	Add continue_final_message to chat completions (#431 ) * API: Add continue_final_message to chat completions * API: Compose response_prefix with continue_final_message	2026-07-15 22:36:29 +02:00
turboderp	224c254083	Logging: Prevent TypeError when reporting metrics (latent bug)	2026-07-15 22:23:38 +02:00
turboderp	c33b769acc	Apply fix from #433 to /v1/completions endpoint as well	2026-07-15 22:22:57 +02:00
NNN	e0f24264bb	Fix ValidationError on early client disconnect and ZeroDivisionError in draft metrics logging (#433 ) * Fix ValidationError when a client disconnects during prefill _chat_stream_collector initialized `generation = {}` and only set the "index" key inside the `async for` loop body. If the client disconnects before the first token is produced (still in prefill), the loop never runs, the empty dict is returned, and _compose_response constructs ChatCompletionRespChoice(index=None), raising a pydantic ValidationError. Initialize `generation = {"index": task_idx}` so a valid index is always present even when no tokens were generated. * Guard against division by zero in draft accept-rate logging When draft_accept and draft_reject are both 0 (a request that produced no draft tokens), total_draft is 0 and `accept / total_draft` raises ZeroDivisionError inside log_metrics. Guard the division and report 0.0% in that case.	2026-07-15 22:06:32 +02:00
turboderp	336f213d6b	Tree: Format	2026-07-15 18:55:08 +02:00
turboderp	f76aa21487	Merge remote-tracking branch 'origin/main' # Conflicts: # endpoints/OAI/utils/chat_completion.py	2026-07-15 18:38:13 +02:00
turboderp	1d2477a535	Model: Merge streamed results from generator (prevent backlog and reduce network overhead from SD). Fix generated_tokens accounting bug.	2026-07-15 18:30:44 +02:00
turboderp	bc99d36bed	API: Multiple revisions: - Remove reasoning_suppress_header - update stream parser to respect multi-token channel tags instead - make start_in_reasoning configurable (default: existing auto behavior) - add "object" and "completed" tags to output chunks - fix "model_name" -> "model" - add tool_calls_in_reasoning option (default: true, prior behavior)	2026-07-15 16:58:31 +02:00
turboderp	195de2a40d	Networking: Add configurable SSE ping interval (default: 15s) to keep alive streaming connections during long prefill	2026-07-15 16:24:46 +02:00
turboderp	221d41496c	Dependencies: Unpin pydantic	2026-07-15 13:35:00 +02:00
turboderp	7f25fbc038	Templating: Move mutable fields to global scope (latent bug)	2026-07-15 12:54:23 +02:00
turboderp	d407f7208c	Sampling: Enable grammar and regex filters (already supported by Formatron)	2026-07-15 12:33:23 +02:00
turboderp	c027fa3318	Sampling: Remove sampling parameters not supported by exllamav3, warn when unsupported args given with request, wire in already-supported min_tokens and token_healing	2026-07-15 03:12:21 +02:00
turboderp	d2ad285c17	Dependencies: Update exllamav3, remove xformers and flash-attn	2026-07-15 02:55:20 +02:00
turboderp	1193ee5a02	Update README.md	2026-07-15 02:32:22 +02:00
turboderp	6457252ca1	Model: Remove redundant enum value	2026-07-15 02:31:50 +02:00
turboderp	09a695a844	Config: Remove mention of CFG from cache_size hint	2026-07-15 02:31:23 +02:00
turboderp	b94533685d	Model: Increase exllamav3 minimum version to 0.0.43, remove redundent inspects	2026-07-15 02:28:34 +02:00
turboderp	f6b7b5758e	Model: Load in a detached task so client disconnecting doesn't cancel an in-progress load.	2026-07-15 02:03:03 +02:00
turboderp	9da1739c96	Model: Explicitly initialize mutable fields on construction.	2026-07-15 01:47:35 +02:00
turboderp	7ced6ed795	Fix circular import in vision.py	2026-07-15 01:15:01 +02:00
turboderp	15bd934021	Fix Ctrl-C handling	2026-07-15 00:59:24 +02:00
turboderp	e3fe9fd7e3	API: Fix crash when calling /v1/model/draft/list with a non-admin key	2026-07-15 00:20:29 +02:00
turboderp	c8b9a8f4aa	Backends: Remove ExLlamaV2 backend and container abstraction layer Remove BaseModelContainer abstraction LoRA endpoints remain as stubs (supported in exllamav3, but API is undecided) Fix /v1/lora/unload unloading the entire model. The last commit with exllamav2 support is preserved on the exl2-checkpoint branch.	2026-07-14 22:40:33 +02:00
Ethan Jones	2e1378c4d0	Fix chat completions returning null logprobs without reasoning/tool tags (#430 ) The content-span logprobs guard collected a token's logprobs only when `tag not in [t_think_end, t_tool_end]`. With no reasoning or tool tags configured both ends are None, and an ordinary content token also has tag None, so `None not in [None, None]` is False and logprobs are never collected. /v1/chat/completions then returns null logprobs for any model lacking both reasoning and tool tags, while /v1/completions returns them correctly. Filter unset tags out of the membership test, matching the `if s` filter the split regex above already applies to the same tags.	2026-07-09 13:46:51 +02:00
kingbri	3cf468c283	Actions: Fix docker buildx casing issue Add step to change the repo name to lowercase Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2026-06-26 21:21:05 -04:00
kingbri	7e4ccd5e8c	Actions: Point to GHCR cache instead of GHA cache Need a longer term cache storage Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2026-06-26 21:13:46 -04:00
kingbri	a3bd248e08	Actions: Use GHCR as Docker layer cache Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2026-06-26 20:53:52 -04:00
suspicious-pineapple	79126f904c	API: Fix v1/token/encode endpoint after regression	2026-06-27 02:40:35 +02:00
turboderp	538654bfcb	Add tests	2026-06-27 02:37:30 +02:00
turboderp	9b6ffdc3b2	exllamav3: Expose loop detect option, enable 800-token window by default	2026-06-16 23:30:16 +02:00
turboderp	d2d87bb9e0	API: Fix error message/code when context length exceeded	2026-06-15 20:36:25 +02:00
turboderp	c1655d1234	Dependencies: Update exllamav3	2026-06-14 20:07:08 +02:00
turboderp	202fbdc6d2	Merge remote-tracking branch 'origin/main'	2026-06-14 16:16:07 +02:00
turboderp	afec2e354f	Fix git error, missing file	2026-06-14 16:15:54 +02:00
lavd	5485088231	add cu13 build (#423 )	2026-06-13 01:47:33 +02:00
turboderp	671c12d78c	API: Reject oversized prompts with error code 400 before committing to EventSourceResponse	2026-06-13 01:34:24 +02:00
turboderp	ddd2c409ad	Logging: Add draft metrics	2026-06-13 00:29:53 +02:00
turboderp	26102e0251	Dependencies: Update exllamav3	2026-06-13 00:13:34 +02:00
turboderp	004e837412	exllamav3: Add draft_mode option, support MTP and n-gram drafting	2026-06-13 00:12:24 +02:00
turboderp	95d1278694	Model: Fix regression when no draft_gpu_split specified	2026-06-12 23:31:34 +02:00
turboderp	637b595bb6	Merge branch 'fork/baronrabban/fix/draft-model-gpu-split'	2026-06-12 21:13:39 +02:00
turboderp	9726fbf0a0	Dependencies: Update exllamav3	2026-06-12 21:10:41 +02:00

1 2 3 4 5 ...

1279 Commits