mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-03-14 15:57:27 +00:00

Go to file

kingbri c82697fef2 API: Fix issues with concurrent requests and queueing

This is the first in many future commits that will overhaul the API
to be more robust and concurrent. The model is admin-first where the
admin can do anything in-case something goes awry.

Previously, calls to long running synchronous background tasks would
block the entire API, making it ignore any terminal signals until
generation is completed.

To fix this, levrage FastAPI's run_in_threadpool to offload the long
running tasks to another thread. However, signals to abort the process
still kept the background thread running and made the terminal hang.

This was due to an issue with Uvicorn not propegating the SIGINT signal
across threads in its event loop. To fix this in a catch-all way, run
the API processes in a separate thread so the main thread can still
kill the process if needed.

In addition, make request error logging more robust and refer to the
console for full error logs rather than creating a long message on the
client-side.

Finally, add state checks to see if a model is fully loaded before
generating a completion.

Signed-off-by: kingbri <bdashore3@proton.me>

2024-03-04 23:21:40 -05:00

.github

Create pull request template

2024-02-09 14:53:29 -05:00

backends/exllamav2

API: Fix issues with concurrent requests and queueing

2024-03-04 23:21:40 -05:00

colab

Tree: Refactor code organization

2024-01-25 00:15:40 -05:00

common

API: Fix issues with concurrent requests and queueing

2024-03-04 23:21:40 -05:00

docker

Tree: Refactor code organization

2024-01-25 00:15:40 -05:00

loras

Implement lora support (#24 )

2023-12-08 23:38:08 -05:00

models

Tree: Update documentation and configs

2023-11-16 02:30:33 -05:00

OAI

OAI: Fix completion token fetching

2024-02-11 01:12:13 -05:00

sampler_overrides

Neutralize samplers (#59 )

2024-02-08 00:23:09 -05:00

templates

Templates: Update folder

2023-12-18 23:53:47 -05:00

tests

Tree: Refactor code organization

2024-01-25 00:15:40 -05:00

.gitignore

Tree: Unify sampler parameters and add override support

2024-01-25 00:15:40 -05:00

.ruff.toml

feat: workflows for formatting/linting (#35 )

2023-12-22 16:20:35 +00:00

config_sample.yml

Additional clarification for override_base_seq_len

2024-03-02 09:29:50 -08:00

formatting.bat

feat: workflows for formatting/linting (#35 )

2023-12-22 16:20:35 +00:00

formatting.sh

feat: workflows for formatting/linting (#35 )

2023-12-22 16:20:35 +00:00

LICENSE

Create LICENSE

2023-11-16 17:43:23 -05:00

main.py

API: Fix issues with concurrent requests and queueing

2024-03-04 23:21:40 -05:00

README.md

Update README

2024-02-20 00:19:31 -05:00

requirements-amd.txt

Requirements: Bump ExllamaV2

2024-02-24 12:26:08 -05:00

requirements-cu118.txt

Requirements: Bump ExllamaV2

2024-02-24 12:26:08 -05:00

requirements-dev.txt

feat: workflows for formatting/linting (#35 )

2023-12-22 16:20:35 +00:00

requirements-nowheel.txt

feat: logging (#39 )

2023-12-23 04:33:31 +00:00

requirements.txt

Requirements: Bump ExllamaV2

2024-02-24 12:26:08 -05:00

start.bat

Tree: Format and cleanup start

2023-12-27 01:17:31 -05:00

start.py

Tree: Refactor code organization

2024-01-25 00:15:40 -05:00

start.sh

Start: Add shell script

2023-12-27 23:53:14 -05:00

README.md

TabbyAPI

Important

In addition to the README, please read the Wiki page for information about getting started!

Note

Need help? Join the Discord Server and get the Tabby role. Please be nice when asking questions.

A FastAPI based application that allows for generating text using an LLM (large language model) using the Exllamav2 backend

Disclaimer

This API is considered as rolling release. There may be bugs and changes down the line. Please be aware that you might need to reinstall dependencies if needed.

Getting Started

Read the Wiki for more information. It contains user-facing documentation for installation, configuration, sampling, API usage, and so much more.

Supported Model Types

TabbyAPI uses Exllamav2 as a powerful and fast backend for model inference, loading, etc. Therefore, the following types of models are supported:

Exl2 (Highly recommended)
GPTQ
FP16 (using Exllamav2's loader)

Alternative Loaders/Backends

If you want to use a different model type than the ones listed above, here are some alternative backends with their own APIs:

Contributing

If you have issues with the project:

Describe the issues in detail
If you have a feature request, please indicate it as such.

If you have a Pull Request

Describe the pull request in detail, what, and why you are changing something

Developers and Permissions

Creators/Developers:

Languages

Python 93.8%

Jupyter Notebook 3.1%

Jinja 1%

Shell 0.9%

Batchfile 0.9%

Other 0.3%