tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-03-15 00:07:28 +00:00

Author	SHA1	Message	Date
kingbri	1a8afcb6ad	Generator: Fix semaphore scheduling Non-streaming tasks were not regulated by the semaphore, causing these tasks to interfere with streaming generations. Add helper functions to take in both sync and async functions for callbacks and sequential blocking with the semaphore. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-21 21:39:45 -05:00
kingbri	bee758dae9	Config: Clarify rope parameters Blank = automatic calculation of alpha value. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-20 21:15:06 -05:00
kingbri	5728b9fffb	Model: Don't error out if a generation is empty When stream is false, the generation can be empty, which means that there's no chunks present in the final generation array, causing an error. Instead, return a dummy value if generation is falsy (empty array or None) Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-20 00:51:33 -05:00
kingbri	ab10b263fd	Model: Add override base seq len Some models (such as mistral and mixtral) set their base sequence length to 32k due to assumptions of support for sliding window attention. Therefore, add this parameter to override the base sequence length of a model which helps with auto-calculation of rope alpha. If auto-calculation of rope alpha isn't being used, the max_seq_len parameter works fine as is. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-20 00:45:39 -05:00
Brian Dashore	5368ed7b64	Merge pull request #31 from veryamazinglystupid/main cuda -> 12, pydantic error fixed.	2023-12-20 00:04:51 -05:00
kingbri	5fbb37405f	Colab: Remove the pydantic hotfix Requirements.txt is now pinned to install pydantic >= 2.0.0 Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-20 00:01:58 -05:00
kingbri	c9e43e51aa	API: Add route for draft model list Does the same thing as model list except with draft models. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 23:45:53 -05:00
kingbri	ce2602df9a	Model: Fix max seq len handling Previously, the max sequence length was overriden by the user's config and never took the model's config.json into account. Now, set the default to 4096, but include config.prepare when selecting the max sequence length. The yaml and API request now serve as overrides rather than parameters. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 23:37:52 -05:00
kingbri	d3246747c0	Templates: Attempt loading from model config Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 22:58:47 -05:00
kingbri	da69ad8cd3	Requirements: Pin versions for some dependencies Pydantic and Jinja2 need pinned versions. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 21:48:04 -05:00
kingbri	1fd38c61de	API: Remove model check dependency for lora list This isn't needed for listing stuff. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 21:35:29 -05:00
veryamazinglystupid	12bf7a0174	fix the colab, pydantic error :3	2023-12-19 19:46:57 +05:30
kingbri	0a144688c6	Templates: Add clarity statements Lets the user know if a file not found (OSError) occurs and prints the applied template on model load. Also fix some remaining references to fastchat. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 08:13:04 -05:00
kingbri	0d76ed9b8b	Revert "Start: Add an argument parser to batch file" This reverts commit `097c298c39`.	2023-12-19 00:01:27 -05:00
kingbri	45e2987622	Start: Fix batch file condition Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:57:30 -05:00
kingbri	097c298c39	Start: Add an argument parser to batch file Used for future arguments. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	c3f7898967	OAI: Add logit bias support Use exllamav2's token bias which is the functional equivalent of OAI's logit bias parameter. Strings are casted to integers on request and errors if an invalid value is passed. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	46f6dc824e	Scripts: Add requirements update to start script Also add an argument to skip the requirements if needed. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	1f2cc8a47b	Templates: Update folder Move README to the separate templates repo. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	bc21f0bbc0	OAI: Add field aliasing Repetition penalty range needs field aliases to support multiple parameter calls. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	124e39df26	Remove fschat from Dockerfile Fastchat is removed from all dependencies Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	de9a19b5d3	Templating: Add generation prompt appending Append generation prompts if given the flag on an OAI chat completion request. This appends the "assistant" message to the instruct prompt. Defaults to true since this is intended behavior. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	041070fd6e	Update gitignore Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	417cb958fa	Auth: Only regenerate auth on OSError OSError means that a file wasn't found, which means auth tokens should be rengenerated. Otherwise, fire the error and exit. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	a87e474660	OAI: Fix chat completion validation Validation wasn't properly run on older pydantic, so ChatCompletionRespChoice was being sent instead of a ChatCompletionMessage when streaming responses. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	7cbc08fc72	Templates: Add auto-detection from path This replicates FastChat's model path detection. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	e895eaa4bd	OAI: Clarify types in docs Adding field descriptions show which parameters are used solely for OAI compliance and not actually parsed in the model code. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	51ca1ff396	Tree: Switch to Pydantic 2 Pydantic 2 has more modern methods and stability compared to Pydantic 1 Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	f631dd6ff7	Templates: Switch to Jinja2 Jinja2 is a lightweight template parser that's used in Transformers for parsing chat completions. It's much more efficient than Fastchat and can be imported as part of requirements. Also allows for unblocking Pydantic's version. Users now have to provide their own template if needed. A separate repo may be usable for common prompt template storage. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	95fd0f075e	Model: Fix no flash attention Was being called wrong from config. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-17 23:31:58 -05:00
kingbri	ad8807a830	Model: Add support for num_experts_by_token New parameter that's safe to edit in exllamav2 v0.0.11. Only recommended for people who know what they're doing. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-17 18:03:01 -05:00
kingbri	70fbee3edd	OAI: Fix model parameter placement Accidentally edited the Model Card parameters vs the model load request ones. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-17 14:36:28 -05:00
kingbri	1d0bdfa77c	Model + OAI: Fix parameter parsing Rope alpha changes don't require removing the 1.0 default from Rope scale. Keep defaults when possible to avoid errors. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-17 14:28:18 -05:00
Veden	3e57125025	OAI: adding optional draft model properties for draft_rope alpha and scale (#28 ) * OAI: adding optional draft model properties for draft_rope alpha and scale * Forgot to set the properties to None	2023-12-17 19:23:45 +00:00
kingbri	528d58f841	Requirements: Fix AMD Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-17 00:45:43 -05:00
kingbri	f196f1177d	Requirements: Update exllamav2 to 0.0.11 Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-16 19:33:42 -05:00
kingbri	1a331afe3a	OAI: Add cache_mode parameter to model Mistakenly forgot that the user can choose what cache mode to use when loading a model. Also add when fetching model info. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-16 02:47:50 -05:00
kingbri	ed868fd262	OAI: Remove unused parameters Seed and low_mem aren't used, so comment them out. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-15 14:56:43 -05:00
kingbri	59729e2a4a	Tests: Fix linting Also change how wheel_test works for safe import testing rather than trying to import the package which can cause system issues. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-13 23:05:50 -05:00
kingbri	036ba2669c	Auth: Migrate to Pydantic It's easier to work with Pydantic dataclasses rather than standard python classes. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-12 23:58:22 -05:00
kingbri	eb8ccb9783	Tree: Fix linter issues Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-12 23:58:19 -05:00
kingbri	083df7d585	Tree: Add generation logging support Generations can be logged in the console along with sampling parameters if the user enables it in config. Metrics are always logged at the end of each prompt. In addition, the model endpoint tells the user if they're being logged or not for transparancy purposes. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-12 23:43:35 -05:00
kingbri	b364de1005	Update README Add alternatives if the user doesn't agree with the focus of TabbyAPI. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-10 16:05:46 -05:00
kingbri	db87efde4a	OAI: Add ability to specify fastchat prompt template Sometimes fastchat may not be able to detect the prompt template from the model path. Therefore, add the ability to set it in config.yml or via the request object itself. Also send the provided prompt template on model info request. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-10 15:43:58 -05:00
kingbri	9f195af5ad	Main: Fix function calls Some function names were declared twice. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-10 13:28:21 -05:00
kingbri	fd9f3eac87	Model: Add params to current model endpoint Grabs the current model rope params, max seq len, and the draft model if applicable. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-10 00:40:56 -05:00
kingbri	0f4290f05c	Model: Format Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-09 22:48:42 -05:00
kingbri	5ae2a91c04	Tree: Use unwrap and coalesce for optional handling Python doesn't have proper handling of optionals. The only way to handle them is checking via an if statement if the value is None or by using the "or" keyword to unwrap optionals. Previously, I used the "or" method to unwrap, but this caused issues due to falsy values falling back to the default. This is especially the case with booleans were "False" changed to "True". Instead, add two new functions: unwrap and coalesce. Both function to properly implement a functional way of "None" coalescing. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-09 21:52:17 -05:00
DocShotgun	7380a3b79a	Implement lora support (#24 ) * Model: Implement basic lora support * Add ability to load loras from config on launch * Supports loading multiple loras and lora scaling * Add function to unload loras * Colab: Update for basic lora support * Model: Test vram alloc after lora load, add docs * Git: Add loras folder to .gitignore * API: Add basic lora-related endpoints * Add /loras/ endpoint for querying available loras * Add /model/lora endpoint for querying currently loaded loras * Add /model/lora/load endpoint for loading loras * Add /model/lora/unload endpoint for unloading loras * Move lora config-checking logic to main.py for better compat with API endpoints * Revert bad CRLF line ending changes * API: Add basic lora-related endpoints (fixed) * Add /loras/ endpoint for querying available loras * Add /model/lora endpoint for querying currently loaded loras * Add /model/lora/load endpoint for loading loras * Add /model/lora/unload endpoint for unloading loras * Move lora config-checking logic to main.py for better compat with API endpoints * Model: Unload loras first when unloading model * API + Models: Cleanup lora endpoints and functions Condenses down endpoint and model load code. Also makes the routes behave the same way as model routes to help not confuse the end user. Signed-off-by: kingbri <bdashore3@proton.me> * Loras: Optimize load endpoint Return successes and failures along with consolidating the request to the rewritten load_loras function. Signed-off-by: kingbri <bdashore3@proton.me> --------- Co-authored-by: kingbri <bdashore3@proton.me> Co-authored-by: DocShotgun <126566557+DocShotgun@users.noreply.github.com>	2023-12-08 23:38:08 -05:00
kingbri	161c9d2c19	Tests: Fix wheel test Fastchat is named fschat from the package's point of view. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-08 01:15:24 -05:00

1 2 3 4

176 Commits