tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-03-14 15:57:27 +00:00

Author	SHA1	Message	Date
kingbri	db87efde4a	OAI: Add ability to specify fastchat prompt template Sometimes fastchat may not be able to detect the prompt template from the model path. Therefore, add the ability to set it in config.yml or via the request object itself. Also send the provided prompt template on model info request. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-10 15:43:58 -05:00
kingbri	fd9f3eac87	Model: Add params to current model endpoint Grabs the current model rope params, max seq len, and the draft model if applicable. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-10 00:40:56 -05:00
kingbri	0f4290f05c	Model: Format Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-09 22:48:42 -05:00
kingbri	5ae2a91c04	Tree: Use unwrap and coalesce for optional handling Python doesn't have proper handling of optionals. The only way to handle them is checking via an if statement if the value is None or by using the "or" keyword to unwrap optionals. Previously, I used the "or" method to unwrap, but this caused issues due to falsy values falling back to the default. This is especially the case with booleans were "False" changed to "True". Instead, add two new functions: unwrap and coalesce. Both function to properly implement a functional way of "None" coalescing. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-09 21:52:17 -05:00
DocShotgun	7380a3b79a	Implement lora support (#24 ) * Model: Implement basic lora support * Add ability to load loras from config on launch * Supports loading multiple loras and lora scaling * Add function to unload loras * Colab: Update for basic lora support * Model: Test vram alloc after lora load, add docs * Git: Add loras folder to .gitignore * API: Add basic lora-related endpoints * Add /loras/ endpoint for querying available loras * Add /model/lora endpoint for querying currently loaded loras * Add /model/lora/load endpoint for loading loras * Add /model/lora/unload endpoint for unloading loras * Move lora config-checking logic to main.py for better compat with API endpoints * Revert bad CRLF line ending changes * API: Add basic lora-related endpoints (fixed) * Add /loras/ endpoint for querying available loras * Add /model/lora endpoint for querying currently loaded loras * Add /model/lora/load endpoint for loading loras * Add /model/lora/unload endpoint for unloading loras * Move lora config-checking logic to main.py for better compat with API endpoints * Model: Unload loras first when unloading model * API + Models: Cleanup lora endpoints and functions Condenses down endpoint and model load code. Also makes the routes behave the same way as model routes to help not confuse the end user. Signed-off-by: kingbri <bdashore3@proton.me> * Loras: Optimize load endpoint Return successes and failures along with consolidating the request to the rewritten load_loras function. Signed-off-by: kingbri <bdashore3@proton.me> --------- Co-authored-by: kingbri <bdashore3@proton.me> Co-authored-by: DocShotgun <126566557+DocShotgun@users.noreply.github.com>	2023-12-08 23:38:08 -05:00
kingbri	fa1e99daf6	Model: Remove unused print statement Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-07 21:13:52 -05:00
kingbri	6a71890d45	Model: Fix sampler bugs Lots of bugs were unearthed when switching to the new fallback changes. Fix them and make sure samplers are being set properly. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-06 17:29:58 -05:00
kingbri	4c0e686e7d	Model: Cleanup and fix fallbacks Use the standard "dict.get("key") or default" to handle fetching values from kwargs and get a fallback value without possible errors. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-05 23:28:16 -05:00
kingbri	d8f7b93c54	Model: Fix fetching of draft args Mistakenly fetched these from parent kwargs instead of the scoped draft_config var. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-05 22:24:27 -05:00
DocShotgun	3f2fcbcc45	Add fallback to draft_rope_scale to 1.0	2023-12-05 18:51:36 -08:00
DocShotgun	39f7a2aabd	Expose draft_rope_scale	2023-12-05 12:59:32 -08:00
kingbri	c67c9f6d66	Model + Config: Remove low_mem option Low_mem doesn't work in exl2 and it was an experimental option to begin with. Keep the loading code commented out in case it gets fixed in the future. A better alternative is to use 8bit cache which works and helps save VRAM. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-03 01:07:42 -05:00
kingbri	27fc0c0069	Model: Cleanup and compartmentalize auto rope functions Also handle an edge case if ratio <= 1 since NTK scaling is only used for values > 1. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-03 01:05:09 -05:00
DocShotgun	bd2c5d0d09	Force auto-alpha to 1.0 if config ctx == base ctx	2023-12-02 21:19:59 -08:00
DocShotgun	1c398b0be7	Add automatic NTK-aware alpha scaling to model * enables automatic calculation of NTK-aware alpha scaling for models if the rope_alpha arg is not passed in the config, using the same formula used for draft models	2023-12-02 21:02:29 -08:00
kingbri	ae69b18583	API: Use FastAPI streaming instead of sse_starlette sse_starlette kept firing a ping response if it was taking too long to set an event. Rather than using a hacky workaround, switch to FastAPI's inbuilt streaming response and construct SSE requests with a utility function. This helps the API become more robust and removes an extra requirement. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-01 01:54:35 -05:00
kingbri	8a5ac5485b	Model: Fix rounding generated_tokens is always a whole number. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-30 01:55:46 -05:00
kingbri	e703c716ee	Merge branch 'main' of https://github.com/ziadloo/tabbyAPI into ziadloo-main	2023-11-30 01:01:48 -05:00
kingbri	3957316b79	Revert "API: Rename repetition_decay -> repetition_slope" This reverts commit `cad144126f`. Change this parameter back to repetition_decay. This is different than rep_pen_slope used in other backends such as kobold and NAI. Still keep the fallback condition though. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-29 22:03:45 -05:00
kingbri	94696543bc	Model: Warn user if context > max_seq_len Unlike other backends, tabby attempts to generate even if the context is greater than the max sequence length via truncation of the given context. Rather than artifically erroring out, give a warning that outputted console metrics are going to be incorrect and to make sure that context <= max_seq_len. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-29 01:35:32 -05:00
kingbri	cad144126f	API: Rename repetition_decay -> repetition_slope Also fix the fallback to use 0 for sanity checking and validation. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-29 01:13:05 -05:00
Mehran Ziadloo	b0c42d0f05	Leveraging local variables	2023-11-27 20:56:56 -08:00
Mehran Ziadloo	ead503c75b	Adding token usage support	2023-11-27 20:05:05 -08:00
kingbri	d47c39da54	API: Don't include draft directory in response The draft directory should be returned for a draft model request (TBD). Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-23 00:07:56 -05:00
kingbri	71b9a53336	API: Add temperature_last support Documented in previous commits. Also make sure that for version checking, check the value of kwargs instead of if the key is present since requests pass default values. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-21 21:20:59 -05:00
turboderp	3337fe6acc	Warning if unsupported samplers are used	2023-11-21 18:35:22 +01:00
turboderp	a54de11cf3	Add new samplers	2023-11-21 18:16:53 +01:00
Veden	f960fac8ff	Fix incorrect ratio calculation for draft model	2023-11-19 13:12:53 -08:00
kingbri	4cddd0400c	Model: Fix draft model loading Use draft_config to find the path instead of kwargs. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-19 02:04:02 -05:00
kingbri	31bc418795	Model: Add context in response output When printing to the console, give information about the context (ingestion token count). Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-19 00:49:32 -05:00
kingbri	6b9af58cc1	Tree: Fix extraneous bugs and update T/s print Model: Add extra information to print and fix the divide by zero error. Auth: Fix validation of API and admin keys to look for the entire key. References #7 and #6 Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-18 22:34:40 -05:00
Brian Dashore	b2410a0436	Merge pull request #4 from waldfee/config_samples Adds draft model support to config.yml	2023-11-18 13:16:23 -05:00
kingbri	27ebec3b35	Model: Add speculative decoding support via config Speculative decoding makes use of draft models that ingest the prompt before forwarding it to the main model. Add options in the config to support this. API options will occur in a different commit. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-18 01:42:20 -05:00
kingbri	2ad79cb9ea	Model: Add tokens in responses Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-17 23:33:48 -05:00
kingbri	9dfa580b1e	Model: Add tokens/second output Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-17 01:16:20 -05:00
kingbri	d5551352bf	Model: Fix parsing of stop conditions Add the EOS token into stop strings after checking kwargs. If ban_eos_token is on, don't add the EOS token in for extra measure. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 17:15:33 -05:00
kingbri	126afdfdc2	Model: Fix gpu split params GPU split auto is a bool and GPU split is an array of integers for GBs to allocate per GPU. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-15 00:55:15 -05:00
kingbri	ea91d17a11	Api: Add ban_eos_token and add_bos_token support Adds the ability for the client to specify whether to add the BOS token and ban the EOS token. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-15 00:55:15 -05:00
kingbri	8fea5391a8	Api: Add token endpoints Support for encoding and decoding with various parameters. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-15 00:55:15 -05:00
kingbri	b625bface9	OAI: Add API-based model loading/unloading and auth routes Models can be loaded and unloaded via the API. Also add authentication to use the API and for administrator tasks. Both types of authorization use different keys. Also fix the unload function to properly free all used vram. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-14 01:17:19 -05:00
kingbri	47343e2f1a	OAI: Add models support The models endpoint fetches all the models that OAI has to offer. However, since this is an OAI clone, just list the models inside the user's configured model directory instead. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-13 21:38:34 -05:00
kingbri	eee8b642bd	OAI: Implement completion API endpoint Add support for /v1/completions with the option to use streaming if needed. Also rewrite API endpoints to use async when possible since that improves request performance. Model container parameter names also needed rewrites as well and set fallback cases to their disabled values. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-13 18:31:26 -05:00
turboderp	4fa4386275	Add new samplers	2023-11-12 08:12:08 +01:00
kingbri	a10c14d357	Config: Switch to YAML and add load progress YAML is a more flexible format when it comes to configuration. Commandline arguments are difficult to remember and configure especially for an API with complicated commandline names. Rather than using half-baked textfiles, implement a proper config solution. Also add a progress bar when loading models in the commandline. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-12 00:21:16 -05:00
kingbri	5d32aa02cd	Tree: Update to use ModelContainer and args Use command-line arguments to load an initial model if necessary. API routes are broken, but we should be using the container from now on as a primary interface with the exllama2 library. Also these args should be turned into a YAML configuration file in the future. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-10 23:19:54 -05:00
turboderp	9d34479e3e	Model container with generator logic, initial	2023-11-11 02:53:00 +01:00

46 Commits