tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-05-12 08:46:40 +00:00

Author	SHA1	Message	Date
turboderp	f1a2416da5	OAI endpoints: Add option to suppress header after reasoning start token (e.g. Gemma4's "thought\n")	2026-04-12 04:12:53 +02:00
turboderp	55124d0fc6	Config: Add force_enable_thinking	2026-04-10 00:16:40 +02:00
turboderp	0409064028	Tools: Refactor and further simplify tool parsing - remove ToolConfig, reduce to a single `tool_format` argument and hard-code extra args like start/end tokens - dispatch to short, self-contained (and probably easily vibe coded) parser for each model type - remove autodetection (seems infeasible since parsing effectively starts during streaming, and there is overlap between tool formats for different models) - streamline xml parser and dedicate to qwen3_coder models - add parsers for glm4.x, minimax-m2.x and mistral (seems shaky, probably because mistralai don't validate against hf) - update docs	2026-04-01 00:07:44 +02:00
turboderp	179479199b	Rework tool calls and OAI chat completions - move tool config from template_vars to separate yml config - new per-gen stream collector used for both streaming and non-streaming requests to ensure logic is consistent for both - move responsibility for switching between phases to stream collector - collect tool calls during streaming and parse at the end of each gen - prevent streaming empty content spans (be nice to clients) - correctly aggregate usage stats for n>1 requests, always emit with last chunk in last gen to finish - collect logprobs in model wrapper and correctly handle logprobs for multi-token chars etc. - respect top_logprobs argument in request - handle a number of edge cases like <think> tag being part of held string, etc. - retain tool parsing and inference-abort fixes from #413, apply similar fix to non-stream request as well Still TODO: - testing and validation with more models and tool schemas (tested on Qwen so far) - enable JSON constraint for JSON tool models - possibly some pydantification - documentation	2026-03-30 00:22:55 +02:00
turboderp	aa54098f26	Ruff: Format (line length)	2026-03-30 00:19:07 +02:00
turboderp	c32a628917	Logging: Add Seq wrapper	2026-03-27 21:38:47 +01:00
turboderp	803ca5c681	Tree: Format	2026-03-20 20:56:43 +01:00
turboderp	0d577b8121	Cleanup and formatting	2026-03-20 01:27:29 +01:00
turboderp	8eb6c65008	Merge branch 'main' into fork/Orion-zhen/feat_reasoning # Conflicts: # config_sample.yml	2026-03-17 23:05:19 +01:00
kingbri	69a25d7fa6	Config + Endpoints: Make cache_size more prominent Since cache_size is a more important parameter now for multi-user setups, mark it as such by placing it below max_seq_len. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-10-14 21:53:33 -04:00
turboderp	8abdfe7b13	Config: replace disable_output_chunking flag with output_chunking	2025-10-14 02:47:52 +02:00
turboderp	52e093ae6c	Model: Enable max_rq_tokens (output chunking)	2025-10-05 18:54:45 +02:00
kingbri	067d63773e	Config: Move sampling higher in the list This has become a bigger priority with addition of the safe_defaults noob proofing. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-18 22:55:03 -04:00
DocShotgun	6fb0c2cdbd	Config: Update description for override_preset default * We provide safe_defaults as a default in config_sample.yml but not internally	2025-08-18 12:39:52 -07:00
DocShotgun	998abe5ad1	Config: Enable safe sampler overrides by default * Provides safe fallback samplers, intended for better out-of-the-box support for clients that do not pass sampler params	2025-08-18 12:32:28 -07:00
kingbri	43f9483bc4	Model: Add tensor_parallel_backend option This allows for users to use nccl or native depending on the GPU setup. NCCL is only available with Linux built wheels. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-17 22:35:10 -04:00
DocShotgun	102af306e5	Config: Remove developer arg cuda_malloc_backend * cudaMallocAsync is now enabled by default on supported configurations	2025-08-01 10:59:13 -07:00
kingbri	2096c9bad2	Model: Default max_seq_len to 4096 A common problem in TabbyAPI is that users who want to get up and running with a model always had issues with max_seq_len causing OOMs. This is because model devs set max context values in the millions which requires a lot of VRAM. To idiot-proof first time setup, make the fallback default 4096 so users can run their models. If a user still wants to use the model's max_seq_len, set it to -1. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-13 14:57:24 -04:00
Brian	02a8d68e17	Merge branch 'exl3' into backend-detect	2025-05-08 23:50:33 -04:00
Brian	527afc206b	Merge pull request #329 from DocShotgun/exl3 Exllamav3 cache quantization	2025-05-08 23:11:45 -04:00
DocShotgun	f8070e7707	Model: Auto detect model backend from config * Use exllamav3 for exl3 models, exllamav2 otherwise	2025-05-06 18:51:58 -07:00
DocShotgun	9dcde59c57	Model: Check for unsupported cache mode in exllamav2	2025-05-06 01:18:15 -07:00
kingbri	b683545d0e	Config: Fix argparse help Adding a comma in the description converts the string to a tuple, which isn't parseable by argparse's help. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-05 21:52:30 -04:00
DocShotgun	58e34ba4c5	Model: Exl3 cache quant settings lenient with whitespace	2025-05-03 20:35:35 -07:00
DocShotgun	68a660bdb3	Model: Initial Exl3 cache quantization support	2025-05-03 20:35:35 -07:00
kingbri	303e2dde12	Model: Correct exl3 generation, add concurrency, and cleanup Fixes application of sampler parameters by adding a new sampler builder interface. Also expose the generator class-wide and add wait_for_jobs. Finally, allow inline loading to specify the backend. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
randoentity	306fc7cd15	fixup: autosplit reserve this probably breaks v2 support	2025-05-02 21:33:25 -04:00
kingbri	7c6a053747	Model: Add option to select backend Changing the backend switches the container that's used. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:39 -04:00
Orion	e1acf8c5ef	Merge branch 'main' into feat_reasoning	2025-03-20 15:25:32 +08:00
kingbri	529c90b93e	Tree: Format and lint Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-19 11:55:02 -04:00
kingbri	d990bbc431	Args: Remove action arguments Superseded by subcommands to perform the same action. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-19 11:53:47 -04:00
kingbri	79f9c6e854	Model: Remove num_experts_per_token This shouldn't even be an exposed option since changing it always breaks inference with the model. Let the model's config.json handle it. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-19 11:52:10 -04:00
Orion-zhen	45190004cf	🔧 move reasoning config to model section	2025-03-18 23:15:38 +08:00
Orion	41a8aa639f	🎨 god, please let me pass format check	2025-03-01 15:38:06 +08:00
Orion	1bd3968e25	🎨 shorten lines	2025-03-01 15:36:28 +08:00
Orion	dfded65254	🎨 format code according to ruff	2025-03-01 15:31:23 +08:00
Orion-zhen	57f0e5f1bc	✨ reasoning parser support	2025-03-01 13:02:29 +08:00
Brian	2e491472d1	Merge pull request #254 from lucyknada/main add draft_gpu_split option for spec decoding	2025-02-11 16:48:03 -05:00
kingbri	30f02e5453	Main: Remove uvloop/winloop from experimental status Uvloop/Winloop does provide advantages to asyncio vs the standard Proactor loop, so remove experimental status. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-10 21:30:48 -05:00
kingbri	beb6d8faa5	Model: Adjust draft_gpu_split and add to config The previous code overrode the existing gpu split and device idx values. This now sets an independent draft_gpu_split value and adjusts the gpu_devices check only if the draft_gpu_split array is larger than the gpu_split array. Draft gpu split is not Tensor Parallel, and defaults to gpu_split_auto if a split is not provided. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-08 16:09:46 -05:00
kingbri	0fadb1e5e8	Merge branch 'main' into vision	2024-11-19 21:19:21 -05:00
DocShotgun	c42655336b	Config: Add option to disable fetching content from URLs	2024-11-17 23:05:17 -08:00
kingbri	bd9e78e19e	API: Add inline exception for dummy models If an API key sends a dummy model, it shouldn't error as the server is catering to clients that expect specific OAI model names. This is a problem with inline model loading since these names would error by default. Therefore, add an exception if the provided name is in the dummy model names (which also doubles as inline strict exceptions). However, the dummy model names weren't configurable, so add a new option to specify exception names, otherwise the default is gpt-3.5-turbo. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-17 21:15:45 -05:00
kingbri	69ac0eb8aa	Model: Add vision loading support Adds the ability to load vision parts of text + image models. Requires an explicit flag in config because there isn't a way to automatically determine whether the vision tower should be used. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-11 12:10:11 -05:00
DocShotgun	603760cecb	Model: Remove override_base_seq_len	2024-10-30 10:03:08 +08:00
kingbri	126a44483c	Tree: Remove fasttensors Now a noop in upstream. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-30 00:18:47 -04:00
kingbri	b30336c75b	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-18 21:42:01 -04:00
kingbri	edf3a00310	Config: Make API server literals case insensitive There's no native way to handle case insensitivity in pydantic, so add a validator which converts the API server input to be lowercase. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-18 21:39:18 -04:00
kingbri	63634beb5e	Config: Clarify Rope alpha options Leaving blank will use the model's set value or auto-calculate. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-17 23:03:28 -04:00
kingbri	a34bd9a684	Config: Alter YAML generation script for formatting adherence Properly add comments and newlines where they need to go. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-17 22:44:42 -04:00

1 2

74 Commits