tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-03-14 15:57:27 +00:00

Author	SHA1	Message	Date
DocShotgun	7380a3b79a	Implement lora support (#24 ) * Model: Implement basic lora support * Add ability to load loras from config on launch * Supports loading multiple loras and lora scaling * Add function to unload loras * Colab: Update for basic lora support * Model: Test vram alloc after lora load, add docs * Git: Add loras folder to .gitignore * API: Add basic lora-related endpoints * Add /loras/ endpoint for querying available loras * Add /model/lora endpoint for querying currently loaded loras * Add /model/lora/load endpoint for loading loras * Add /model/lora/unload endpoint for unloading loras * Move lora config-checking logic to main.py for better compat with API endpoints * Revert bad CRLF line ending changes * API: Add basic lora-related endpoints (fixed) * Add /loras/ endpoint for querying available loras * Add /model/lora endpoint for querying currently loaded loras * Add /model/lora/load endpoint for loading loras * Add /model/lora/unload endpoint for unloading loras * Move lora config-checking logic to main.py for better compat with API endpoints * Model: Unload loras first when unloading model * API + Models: Cleanup lora endpoints and functions Condenses down endpoint and model load code. Also makes the routes behave the same way as model routes to help not confuse the end user. Signed-off-by: kingbri <bdashore3@proton.me> * Loras: Optimize load endpoint Return successes and failures along with consolidating the request to the rewritten load_loras function. Signed-off-by: kingbri <bdashore3@proton.me> --------- Co-authored-by: kingbri <bdashore3@proton.me> Co-authored-by: DocShotgun <126566557+DocShotgun@users.noreply.github.com>	2023-12-08 23:38:08 -05:00
DocShotgun	39f7a2aabd	Expose draft_rope_scale	2023-12-05 12:59:32 -08:00
kingbri	c67c9f6d66	Model + Config: Remove low_mem option Low_mem doesn't work in exl2 and it was an experimental option to begin with. Keep the loading code commented out in case it gets fixed in the future. A better alternative is to use 8bit cache which works and helps save VRAM. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-03 01:07:42 -05:00
kingbri	6493b1d2aa	OAI: Add ability to send dummy models Some APIs require an OAI model to be sent against the models endpoint. Fix this by adding a GPT 3.5 turbo entry as first in the list to cover as many APIs as possible. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-01 00:27:28 -05:00
kingbri	581e1fc219	Sample config: Remove unused value Draft models are specified in the draft sublock. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-19 01:16:03 -05:00
kingbri	e0e93c103b	Sample config: Uncomment all parameters This helps clarify things when users are configuring for the first time. For example, some users were putting the model name in the "model" block instead of the "model_name" field. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-19 01:12:07 -05:00
kingbri	27ebec3b35	Model: Add speculative decoding support via config Speculative decoding makes use of draft models that ingest the prompt before forwarding it to the main model. Add options in the config to support this. API options will occur in a different commit. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-18 01:42:20 -05:00
waldfee	78a6587b95	add cache_mode and draft_model_dir to config_sample.yml	2023-11-17 22:08:31 +01:00
kingbri	08a183540b	Config: Add warning on exceptions and clarify parameters Due to how YAML works, double quotes are bad. Specify a linter in the top of the config_sample file. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 22:19:47 -05:00
kingbri	03f45cb0a3	Tree: Update documentation and configs Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 02:30:33 -05:00
kingbri	b625bface9	OAI: Add API-based model loading/unloading and auth routes Models can be loaded and unloaded via the API. Also add authentication to use the API and for administrator tasks. Both types of authorization use different keys. Also fix the unload function to properly free all used vram. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-14 01:17:19 -05:00
kingbri	a10c14d357	Config: Switch to YAML and add load progress YAML is a more flexible format when it comes to configuration. Commandline arguments are difficult to remember and configure especially for an API with complicated commandline names. Rather than using half-baked textfiles, implement a proper config solution. Also add a progress bar when loading models in the commandline. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-12 00:21:16 -05:00

12 Commits