tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-03-15 00:07:28 +00:00

Author	SHA1	Message	Date
kingbri	ab10b263fd	Model: Add override base seq len Some models (such as mistral and mixtral) set their base sequence length to 32k due to assumptions of support for sliding window attention. Therefore, add this parameter to override the base sequence length of a model which helps with auto-calculation of rope alpha. If auto-calculation of rope alpha isn't being used, the max_seq_len parameter works fine as is. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-20 00:45:39 -05:00
kingbri	c9e43e51aa	API: Add route for draft model list Does the same thing as model list except with draft models. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 23:45:53 -05:00
kingbri	ce2602df9a	Model: Fix max seq len handling Previously, the max sequence length was overriden by the user's config and never took the model's config.json into account. Now, set the default to 4096, but include config.prepare when selecting the max sequence length. The yaml and API request now serve as overrides rather than parameters. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 23:37:52 -05:00
kingbri	c3f7898967	OAI: Add logit bias support Use exllamav2's token bias which is the functional equivalent of OAI's logit bias parameter. Strings are casted to integers on request and errors if an invalid value is passed. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	bc21f0bbc0	OAI: Add field aliasing Repetition penalty range needs field aliases to support multiple parameter calls. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	de9a19b5d3	Templating: Add generation prompt appending Append generation prompts if given the flag on an OAI chat completion request. This appends the "assistant" message to the instruct prompt. Defaults to true since this is intended behavior. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	a87e474660	OAI: Fix chat completion validation Validation wasn't properly run on older pydantic, so ChatCompletionRespChoice was being sent instead of a ChatCompletionMessage when streaming responses. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	e895eaa4bd	OAI: Clarify types in docs Adding field descriptions show which parameters are used solely for OAI compliance and not actually parsed in the model code. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	51ca1ff396	Tree: Switch to Pydantic 2 Pydantic 2 has more modern methods and stability compared to Pydantic 1 Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	f631dd6ff7	Templates: Switch to Jinja2 Jinja2 is a lightweight template parser that's used in Transformers for parsing chat completions. It's much more efficient than Fastchat and can be imported as part of requirements. Also allows for unblocking Pydantic's version. Users now have to provide their own template if needed. A separate repo may be usable for common prompt template storage. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	ad8807a830	Model: Add support for num_experts_by_token New parameter that's safe to edit in exllamav2 v0.0.11. Only recommended for people who know what they're doing. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-17 18:03:01 -05:00
kingbri	70fbee3edd	OAI: Fix model parameter placement Accidentally edited the Model Card parameters vs the model load request ones. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-17 14:36:28 -05:00
kingbri	1d0bdfa77c	Model + OAI: Fix parameter parsing Rope alpha changes don't require removing the 1.0 default from Rope scale. Keep defaults when possible to avoid errors. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-17 14:28:18 -05:00
Veden	3e57125025	OAI: adding optional draft model properties for draft_rope alpha and scale (#28 ) * OAI: adding optional draft model properties for draft_rope alpha and scale * Forgot to set the properties to None	2023-12-17 19:23:45 +00:00
kingbri	1a331afe3a	OAI: Add cache_mode parameter to model Mistakenly forgot that the user can choose what cache mode to use when loading a model. Also add when fetching model info. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-16 02:47:50 -05:00
kingbri	ed868fd262	OAI: Remove unused parameters Seed and low_mem aren't used, so comment them out. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-15 14:56:43 -05:00
kingbri	083df7d585	Tree: Add generation logging support Generations can be logged in the console along with sampling parameters if the user enables it in config. Metrics are always logged at the end of each prompt. In addition, the model endpoint tells the user if they're being logged or not for transparancy purposes. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-12 23:43:35 -05:00
kingbri	db87efde4a	OAI: Add ability to specify fastchat prompt template Sometimes fastchat may not be able to detect the prompt template from the model path. Therefore, add the ability to set it in config.yml or via the request object itself. Also send the provided prompt template on model info request. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-10 15:43:58 -05:00
kingbri	fd9f3eac87	Model: Add params to current model endpoint Grabs the current model rope params, max seq len, and the draft model if applicable. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-10 00:40:56 -05:00
kingbri	5ae2a91c04	Tree: Use unwrap and coalesce for optional handling Python doesn't have proper handling of optionals. The only way to handle them is checking via an if statement if the value is None or by using the "or" keyword to unwrap optionals. Previously, I used the "or" method to unwrap, but this caused issues due to falsy values falling back to the default. This is especially the case with booleans were "False" changed to "True". Instead, add two new functions: unwrap and coalesce. Both function to properly implement a functional way of "None" coalescing. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-09 21:52:17 -05:00
DocShotgun	7380a3b79a	Implement lora support (#24 ) * Model: Implement basic lora support * Add ability to load loras from config on launch * Supports loading multiple loras and lora scaling * Add function to unload loras * Colab: Update for basic lora support * Model: Test vram alloc after lora load, add docs * Git: Add loras folder to .gitignore * API: Add basic lora-related endpoints * Add /loras/ endpoint for querying available loras * Add /model/lora endpoint for querying currently loaded loras * Add /model/lora/load endpoint for loading loras * Add /model/lora/unload endpoint for unloading loras * Move lora config-checking logic to main.py for better compat with API endpoints * Revert bad CRLF line ending changes * API: Add basic lora-related endpoints (fixed) * Add /loras/ endpoint for querying available loras * Add /model/lora endpoint for querying currently loaded loras * Add /model/lora/load endpoint for loading loras * Add /model/lora/unload endpoint for unloading loras * Move lora config-checking logic to main.py for better compat with API endpoints * Model: Unload loras first when unloading model * API + Models: Cleanup lora endpoints and functions Condenses down endpoint and model load code. Also makes the routes behave the same way as model routes to help not confuse the end user. Signed-off-by: kingbri <bdashore3@proton.me> * Loras: Optimize load endpoint Return successes and failures along with consolidating the request to the rewritten load_loras function. Signed-off-by: kingbri <bdashore3@proton.me> --------- Co-authored-by: kingbri <bdashore3@proton.me> Co-authored-by: DocShotgun <126566557+DocShotgun@users.noreply.github.com>	2023-12-08 23:38:08 -05:00
kingbri	f8e9e22c43	API: Fix model load endpoint with draft Draft wasn't being parsed correctly with the new changes which removed the draft_enabled bool. There's still some more work to be done with returning exceptions. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-06 18:05:55 -05:00
kingbri	6a71890d45	Model: Fix sampler bugs Lots of bugs were unearthed when switching to the new fallback changes. Fix them and make sure samplers are being set properly. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-06 17:29:58 -05:00
kingbri	61f6e51fdb	OAI: Add separator style fallback Some models may return None for separator style with FastChat. Fall back to LLAMA2 if this is the case. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-01 23:30:19 -05:00
kingbri	aef411bed5	OAI: Fix chat completion streaming Chat completions require a finish reason to be provided in the OAI spec once the streaming is completed. This is different from a non- streaming chat completion response. Also fix some errors that were raised from the endpoint. References #15 Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-01 00:14:24 -05:00
kingbri	e703c716ee	Merge branch 'main' of https://github.com/ziadloo/tabbyAPI into ziadloo-main	2023-11-30 01:01:48 -05:00
kingbri	3957316b79	Revert "API: Rename repetition_decay -> repetition_slope" This reverts commit `cad144126f`. Change this parameter back to repetition_decay. This is different than rep_pen_slope used in other backends such as kobold and NAI. Still keep the fallback condition though. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-29 22:03:45 -05:00
kingbri	cad144126f	API: Rename repetition_decay -> repetition_slope Also fix the fallback to use 0 for sanity checking and validation. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-29 01:13:05 -05:00
kingbri	5cbf7f13da	OAI: Fix repetition range Alias repetition_penalty_range to repetition_range since that's used as an internal variable. Perhaps in the future, there should be a function that allows for iterating through request aliases and give a default value. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-29 00:53:19 -05:00
Mehran Ziadloo	ead503c75b	Adding token usage support	2023-11-27 20:05:05 -08:00
kingbri	d47c39da54	API: Don't include draft directory in response The draft directory should be returned for a draft model request (TBD). Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-23 00:07:56 -05:00
kingbri	71b9a53336	API: Add temperature_last support Documented in previous commits. Also make sure that for version checking, check the value of kwargs instead of if the key is present since requests pass default values. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-21 21:20:59 -05:00
kingbri	f47919b1d3	API: Add draft model support Models can be loaded with a child object called "draft" in the POST request. Again, models need to be located within the draft model dir to get loaded. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-19 00:32:25 -05:00
kingbri	d627d14385	API: Fix exceptions and defaults Stop conditions was None, causing model to error out when trying to add the EOS token to a None value. Authentication failed when Bearer contained an empty string. To fix this, add a condition which checks array length. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-17 17:56:05 -05:00
kingbri	282b5b2931	API: Fix responses and some params Responses were not being properly sent as JSON. Only run pydantic's JSON function on stream responses. FastAPI does the rest with static responses. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 17:11:55 -05:00
kingbri	60eb076b43	Tree: Basic formatting and comments Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 11:48:40 -05:00
kingbri	2248705c4a	Requirements: Don't force fastchat installation Fastchat requires a lot of dependencies such as transformers, peft, and accelerate which are heavy. This is not useful unless a user wants to add a shim for the chat completion endpoint. Instead, try importing fastchat and notify the console of the error. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 01:26:46 -05:00
kingbri	5e8419ec0c	OAI: Add chat completions endpoint Chat completions is the endpoint that will be used by OAI in the future. Makes sense to support it even though the completions endpoint will be used more often. Also unify common parameters between the chat completion and completion requests since they're very similar. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 01:06:07 -05:00
kingbri	d0b6b11068	OAI: Make freq and presence pen floats Also rename the completions typing file. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-15 00:55:15 -05:00
kingbri	126afdfdc2	Model: Fix gpu split params GPU split auto is a bool and GPU split is an array of integers for GBs to allocate per GPU. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-15 00:55:15 -05:00
kingbri	ea91d17a11	Api: Add ban_eos_token and add_bos_token support Adds the ability for the client to specify whether to add the BOS token and ban the EOS token. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-15 00:55:15 -05:00
kingbri	8fea5391a8	Api: Add token endpoints Support for encoding and decoding with various parameters. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-15 00:55:15 -05:00
kingbri	4670a77c26	API: Don't use response_class This arg in routes caused many errors and isn't even needed for responses. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-14 22:09:26 -05:00
kingbri	b625bface9	OAI: Add API-based model loading/unloading and auth routes Models can be loaded and unloaded via the API. Also add authentication to use the API and for administrator tasks. Both types of authorization use different keys. Also fix the unload function to properly free all used vram. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-14 01:17:19 -05:00
kingbri	47343e2f1a	OAI: Add models support The models endpoint fetches all the models that OAI has to offer. However, since this is an OAI clone, just list the models inside the user's configured model directory instead. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-13 21:38:34 -05:00
kingbri	eee8b642bd	OAI: Implement completion API endpoint Add support for /v1/completions with the option to use streaming if needed. Also rewrite API endpoints to use async when possible since that improves request performance. Model container parameter names also needed rewrites as well and set fallback cases to their disabled values. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-13 18:31:26 -05:00

46 Commits