tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-04-20 14:28:54 +00:00

Author	SHA1	Message	Date
kingbri	dd30d6592a	Merge branch 'main' of https://github.com/theroyallab/tabbyapi into inline	2024-09-03 18:03:17 -04:00
Ben Gitter	70b9fc95de	[WIP] OpenAI Tools Support/Function calling (#154 ) * returning stop str if exists from gen * added chat template for firefunctionv2 * pulling tool vars from template * adding parsing for tool inputs/outputs * passing tool data from endpoint to chat template, adding tool_start to the stop list * loosened typing on the response tool call, leaning more on the user supplying a quality schema if they want a particular format * non streaming generation prototype * cleaning template * Continued work with type, ingestion into template, and chat template for fire func * Correction - streaming toolcall comes back as delta obj not inside chatcomprespchoice per chat_completion_chunk.py inside OAI lib. * Ruff Formating * Moved stop string and tool updates out of prompt creation func Updated tool pydantic to match OAI Support for streaming Updated generate tool calls to use flag within chat_template and insert tool reminder * Llama 3.1 chat templates Updated fire func template * renamed llama3.1 to chatml_with_headers.. * update name of template * Support for calling a tool start token rather than the string. Simplified tool_params Warning when gen_settings are being overidden becuase user set temp to 0 Corrected schema and tools to correct types for function args. Str for some reason * draft groq tool use model template * changed headers to vars for readablity (but mostly because some models are weird about newlines after headers, so this is an easier way to change globally) * Clean up comments and code in chat comp * Post processed tool call to meet OAI spec rather than forcing model to write json in a string in the middle of the call. * changes example back to args as json rather than string of json * Standardize chat templates to each other * cleaning/rewording * stop elements can also be ints (tokens) * Cleaning/formatting * added special tokens for tools and tool_response as specified in description * Cleaning * removing aux templates - going to live in llm-promp-templates repo instead * Tree: Format Signed-off-by: kingbri <bdashore3@proton.me> * Chat Completions: Don't include internal tool variables in OpenAPI Use SkipJsonSchema to supress inclusion with the OpenAPI JSON. The location of these variables may need to be changed in the future. Signed-off-by: kingbri <bdashore3@proton.me> * Templates: Deserialize metadata on template load Since we're only looking for specific template variables that are static in the template, it makes more sense to render when the template is initialized. Signed-off-by: kingbri <bdashore3@proton.me> * Tools: Fix comments Adhere to the format style of comments in the rest of the project. Signed-off-by: kingbri <bdashore3@proton.me> --------- Co-authored-by: Ben Gitter <gitterbd@gmail.com> Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 00:16:25 -04:00
Bartowski	c75e911f07	Merge branch 'main' into main	2024-08-14 16:16:15 -04:00
AlpinDale	5adfab1cbd	ruff: formatting	2024-07-26 02:53:14 +00:00
AlpinDale	f20cd330ef	feat: add embeddings support via sentence-transformers	2024-07-26 02:45:07 +00:00
kingbri	9ad69e8ab6	API: Migrate universal routes to core Place OAI specific routes in the appropriate folder. This is in preperation for adding new API servers that can be optionally enabled. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 14:08:48 -04:00
Volodymyr Kuznetsov	b149d3398d	OAI: support stream_options argument	2024-07-11 18:37:50 -07:00
Colin Kealty	279e900ea5	Add on the fly model loading to requests	2024-07-11 10:52:10 -04:00
kingbri	27d2d5f3d2	Config + Model: Allow for default fallbacks from config for model loads Previously, the parameters under the "model" block in config.yml only handled the loading of a model on startup. This meant that any subsequent API request required each parameter to be filled out or use a sane default (usually defaults to the model's config.json). However, there are cases where admins may want an argument from the config to apply if the parameter isn't provided in the request body. To help alleviate this, add a mechanism that works like sampler overrides where users can specify a flag that acts as a fallback. Therefore, this change both preserves the source of truth of what parameters the admin is loading and adds some convenience for users that want customizable defaults for their requests. This behavior may change in the future, but I think it solves the issue for now. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-06 17:50:58 -04:00
DocShotgun	156b74f3f0	Revision to paged attention checks (#133 ) * Model: Clean up paged attention checks * Model: Move cache_size checks after paged attn checks Cache size is only relevant in paged mode * Model: Fix no_flash_attention * Model: Remove no_flash_attention Ability to use flash attention is auto-detected, so this flag is unneeded. Uninstall flash attention to disable it on supported hardware.	2024-06-09 17:28:11 +02:00
DocShotgun	55d979b7a5	Update dependencies, support Python 3.12, update for exl2 0.1.5 (#134 ) * Dependencies: Add wheels for Python 3.12 * Model: Switch fp8 cache to Q8 cache * Model: Add ability to set draft model cache mode * Dependencies: Bump exllamav2 to 0.1.5 * Model: Support Q6 cache * Config: Add Q6 cache and draft_cache_mode to config sample	2024-06-09 17:27:39 +02:00
Orion	6cc3bd9752	feat: list support in message.content (#122 )	2024-06-03 19:57:15 +02:00
kingbri	e95e67a000	OAI: Add validation to "n" n must be greater than 1 to generate. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-28 00:52:30 -04:00
kingbri	b944f8d756	OAI: Add "n" for non-streaming generations This adds the ability to add multiple choices to a generation. This is only available for non-streaming gens for now, it requires some more work to port over to streaming. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-28 00:52:30 -04:00
DocShotgun	7ab7ffd562	Tree: Format	2024-05-26 15:48:18 -07:00
DocShotgun	767e6a798a	API + Model: Add support for specifying k/v cache size	2024-05-26 14:17:01 -07:00
kingbri	408c66a1f2	Model: Change FA2 and paged attention checks The dynamic generator requires Flash attention 2.5.7 or higher to be installed. This is only supported on Nvidia's 30 series and higher. If a card is AMD or lower than the 30 series, switch to compatability mode which functions the same way as the older generator, except without parallel batching and any features that depend on it, such as CFG. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-25 21:16:14 -04:00
kingbri	6f4012d20d	API: Add preset listing for sampler overrides Querying the overrides list endpoint now returns the selected preset and a list of presets to use. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-12 01:34:51 -04:00
kingbri	ab526f7278	Revert "API: Remove unncessary Optional signatures" This reverts commit `7556dcf134`. The Optionals allowed requests to send "null" in the body for optional parameters which should be allowed. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-02 21:23:48 -04:00
kingbri	7556dcf134	API: Remove unncessary Optional signatures Optional isn't necessary if the function signature has a default value. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-01 00:04:52 -04:00
kingbri	50e0b71690	Downloader: Fix handling of include pattern If an include or exclude pattern is provided, include should include all files by default. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-30 01:13:06 -04:00
kingbri	21a01741c9	Downloader: Add include and exclude parameters These both take an array of glob strings to state what files or directories to include or exclude when parsing the download list. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-30 00:58:54 -04:00
kingbri	55ccd1baad	API: Add HuggingFace downloader Adds an asynchronous huggingface downloader that uses HF hub to fetch all repo files. The current HF hub package has a snapshot_download function that does not cancel on KeyboardInterrupt. Instead, make a downloader that uses the Rich progress bar styling along with a cancellable interface. Finally, link this to TabbyAPI. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-29 01:15:02 -04:00
kingbri	fb1d2f34c1	OAI: Add response_prefix and fix BOS token issues in chat completions response_prefix is used to add a prefix before generating the next message. This is used in many cases such as continuining a prompt (see #96). Also if a template has BOS token specified, add_bos_token will append two BOS tokens. Add a check which strips a starting BOS token from the prompt if it exists. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-25 00:54:43 -04:00
kingbri	515b3c2930	OAI: Tokenize chat completion messages Since chat completion messages are a structure, format the prompt before checking in the tokenizer. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-15 14:17:16 -04:00
kingbri	2a0aaa2e8a	OAI: Add ability to pass extra vars in jinja templates A chat completion can now declare extra template_vars to pass when a template is rendered, opening up the possibility of using state outside of huggingface's parameters. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-11 09:49:25 -04:00
kingbri	b1f3baad74	OAI: Add response_format parameter response_format allows a user to request a valid, but arbitrary JSON object from the API. This is a new part of the OAI spec. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-09 21:33:31 -04:00
kingbri	d759a15559	Model: Fix chunk size handling Wrong class attribute name used for max_attention_size and fixes declaration of the draft model's chunk_size. Also expose the parameter to the end user in both config and model load. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-07 18:39:19 -04:00
kingbri	56fdfb5f8e	OAI: Add stream to gen params Good for logging. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-21 00:55:44 -04:00
kingbri	5c7fc69ded	API: Fix finish_reason returns OAI expects finish_reason to be "stop" or "length" (there are others, but they're not in the current scope of this project). Make all completions and chat completions responses return this from the model generation itself rather than putting a placeholder. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-18 15:59:28 -04:00
kingbri	3c08f46c51	Endpoints: Add key permission checker This is a definite way to check if an authorized key is API or admin. The endpoint only runs if the key is valid in the first place to keep inline with the API's security model. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-18 00:53:27 -04:00
kingbri	1ec8eb9620	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-13 00:02:55 -04:00
kingbri	104a6121cb	API: Split into separate folder Moving the API into its own directory helps compartmentalize it and allows for cleaning up the main file to just contain bootstrapping and the entry point. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-12 23:59:30 -04:00

33 Commits