mirror of
https://github.com/theroyallab/tabbyAPI.git
synced 2026-04-29 18:51:53 +00:00
100
README.md
100
README.md
@@ -1,6 +1,10 @@
|
|||||||
# TabbyAPI
|
# TabbyAPI
|
||||||
|
|
||||||
A FastAPI based application that allows for generating text using an LLM (large language model) using the [exllamav2 backend](https://github.com/turboderp/exllamav2).
|
> [!NOTE]
|
||||||
|
>
|
||||||
|
> Need help? Join the [Discord Server](https://discord.gg/sYQxnuD7Fj) and get the `Tabby` role. Please be nice when asking questions.
|
||||||
|
|
||||||
|
A FastAPI based application that allows for generating text using an LLM (large language model) using the https://github.com/turboderp/exllamav2.
|
||||||
|
|
||||||
## Disclaimer
|
## Disclaimer
|
||||||
|
|
||||||
@@ -29,38 +33,28 @@ NOTE: For Flash Attention 2 to work on Windows, CUDA 12.x **must** be installed!
|
|||||||
2. Navigate to the project directory: `cd tabbyAPI`
|
2. Navigate to the project directory: `cd tabbyAPI`
|
||||||
|
|
||||||
3. Create a python environment:
|
3. Create a python environment:
|
||||||
|
|
||||||
1. Through venv (recommended)
|
1. Through venv (recommended)
|
||||||
|
|
||||||
1. `python -m venv venv`
|
1. `python -m venv venv`
|
||||||
|
|
||||||
2. On Windows (Using powershell or Windows terminal): `.\venv\Scripts\activate`. On Linux: `source venv/bin/activate`
|
2. On Windows (Using powershell or Windows terminal): `.\venv\Scripts\activate`. On Linux: `source venv/bin/activate`
|
||||||
|
|
||||||
2. Through conda
|
2. Through conda
|
||||||
|
|
||||||
1. `conda create -n tabbyAPI python=3.11`
|
1. `conda create -n tabbyAPI python=3.11`
|
||||||
|
|
||||||
2. `conda activate tabbyAPI`
|
2. `conda activate tabbyAPI`
|
||||||
|
|
||||||
4. Install torch using the instructions found [here](https://pytorch.org/get-started/locally/)
|
4. Install the requirements file based on your system:
|
||||||
|
|
||||||
5. Install exllamav2 (must be v0.0.9 or greater!)
|
|
||||||
|
|
||||||
NOTE: TabbyAPI will give you a warning if a sampler isn't found due to the exllamav2 version being too low.
|
1. Cuda 12.x: `pip install -r requirements.txt`
|
||||||
|
|
||||||
1. From a [wheel/release](https://github.com/turboderp/exllamav2#method-2-install-from-release-with-prebuilt-extension) (Recommended)
|
|
||||||
|
|
||||||
1. Find the version that corresponds with your cuda and python version. For example, a wheel with `cu121` and `cp311` corresponds to CUDA 12.1 and python 3.11
|
|
||||||
|
|
||||||
2. From [pip](https://github.com/turboderp/exllamav2#method-3-install-from-pypi): `pip install exllamav2`
|
2. Cuda 11.8: `pip install -r requirements-cu118.txt`
|
||||||
|
|
||||||
1. This is a JIT compiled extension, which means that the initial launch of tabbyAPI will take some time. The build may also not work due to improper environment configuration.
|
|
||||||
|
|
||||||
3. From [source](https://github.com/turboderp/exllamav2#method-1-install-from-source)
|
3. ROCm 5.6: `pip install -r requirements-amd.txt`
|
||||||
|
|
||||||
6. Install the other requirements via: `pip install -r requirements.txt`
|
5. If you want the `/v1/chat/completions` endpoint to work with a list of messages, install fastchat by running `pip install fschat[model_worker]`
|
||||||
|
|
||||||
7. If you want the `/v1/chat/completions` endpoint to work with a list of messages, install fastchat by running `pip install fschat[model_worker]`
|
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
@@ -74,6 +68,36 @@ If you do want a config file, copy over `config_sample.yml` to `config.yml`. All
|
|||||||
|
|
||||||
2. Run the tabbyAPI application: `python main.py`
|
2. Run the tabbyAPI application: `python main.py`
|
||||||
|
|
||||||
|
## Updating
|
||||||
|
|
||||||
|
To update tabbyAPI, just run `pip install --upgrade -r requirements.txt` using the `requirements.txt` for your configuration (ex. CUDA 11.8 or ROCm 5.6)
|
||||||
|
|
||||||
|
### Update Exllamav2
|
||||||
|
|
||||||
|
> [!WARNING]
|
||||||
|
>
|
||||||
|
> These instructions are meant for advanced users.
|
||||||
|
|
||||||
|
If the version of exllamav2 doesn't meet your specifications, you can install the dependency from various sources.
|
||||||
|
|
||||||
|
NOTE:
|
||||||
|
|
||||||
|
- TabbyAPI will print a warning if a sampler isn't found due to the exllamav2 version being too low.
|
||||||
|
|
||||||
|
- Any upgrades using a requirements file will result in overwriting your installed wheel. To fix this, change `requirements.txt` locally, create an issue or PR, or install your version of exllamav2 after upgrades.
|
||||||
|
|
||||||
|
Here are ways to install exllamav2:
|
||||||
|
|
||||||
|
1. From a [wheel/release](https://github.com/turboderp/exllamav2#method-2-install-from-release-with-prebuilt-extension) (Recommended)
|
||||||
|
|
||||||
|
1. Find the version that corresponds with your cuda and python version. For example, a wheel with `cu121` and `cp311` corresponds to CUDA 12.1 and python 3.11
|
||||||
|
|
||||||
|
2. From [pip](https://github.com/turboderp/exllamav2#method-3-install-from-pypi): `pip install exllamav2`
|
||||||
|
|
||||||
|
1. This is a JIT compiled extension, which means that the initial launch of tabbyAPI will take some time. The build may also not work due to improper environment configuration.
|
||||||
|
|
||||||
|
3. From [source](https://github.com/turboderp/exllamav2#method-1-install-from-source)
|
||||||
|
|
||||||
## API Documentation
|
## API Documentation
|
||||||
|
|
||||||
Docs can be accessed once you launch the API at `http://<your-IP>:<your-port>/docs`
|
Docs can be accessed once you launch the API at `http://<your-IP>:<your-port>/docs`
|
||||||
@@ -105,22 +129,22 @@ All routes require an API key except for the following which require an **admin*
|
|||||||
## Common Issues
|
## Common Issues
|
||||||
|
|
||||||
- AMD cards will error out with flash attention installed, even if the config option is set to False. Run `pip uninstall flash_attn` to remove the wheel from your system.
|
- AMD cards will error out with flash attention installed, even if the config option is set to False. Run `pip uninstall flash_attn` to remove the wheel from your system.
|
||||||
|
|
||||||
- See [#5](https://github.com/theroyallab/tabbyAPI/issues/5)
|
- See [#5](https://github.com/theroyallab/tabbyAPI/issues/5)
|
||||||
|
|
||||||
- Exllamav2 may error with the following exception: `ImportError: DLL load failed while importing exllamav2_ext: The specified module could not be found.`
|
- Exllamav2 may error with the following exception: `ImportError: DLL load failed while importing exllamav2_ext: The specified module could not be found.`
|
||||||
|
|
||||||
- First, make sure to check if the wheel is equivalent to your python version and CUDA version. Also make sure you're in a venv or conda environment.
|
- First, make sure to check if the wheel is equivalent to your python version and CUDA version. Also make sure you're in a venv or conda environment.
|
||||||
|
|
||||||
- If those prerequisites are correct, the torch cache may need to be cleared. This is due to a mismatching exllamav2_ext.
|
- If those prerequisites are correct, the torch cache may need to be cleared. This is due to a mismatching exllamav2_ext.
|
||||||
|
|
||||||
- In Windows: Find the cache at `C:\Users\<User>\AppData\Local\torch_extensions\torch_extensions\Cache` where `<User>` is your Windows username
|
- In Windows: Find the cache at `C:\Users\<User>\AppData\Local\torch_extensions\torch_extensions\Cache` where `<User>` is your Windows username
|
||||||
|
|
||||||
- In Linux: Find the cache at `~/.cache/torch_extensions`
|
- In Linux: Find the cache at `~/.cache/torch_extensions`
|
||||||
|
|
||||||
- look for any folder named `exllamav2_ext` in the python subdirectories and delete them.
|
- look for any folder named `exllamav2_ext` in the python subdirectories and delete them.
|
||||||
|
|
||||||
- Restart TabbyAPI and launching should work again.
|
- Restart TabbyAPI and launching should work again.
|
||||||
|
|
||||||
## Contributing
|
## Contributing
|
||||||
|
|
||||||
@@ -134,10 +158,6 @@ If you have a Pull Request
|
|||||||
|
|
||||||
- Describe the pull request in detail, what, and why you are changing something
|
- Describe the pull request in detail, what, and why you are changing something
|
||||||
|
|
||||||
## Support
|
|
||||||
|
|
||||||
Need help? Join the [Discord Server](https://discord.gg/sYQxnuD7Fj) and get the `Tabby` role. Please be nice when asking questions.
|
|
||||||
|
|
||||||
## Developers and Permissions
|
## Developers and Permissions
|
||||||
|
|
||||||
Creators/Developers:
|
Creators/Developers:
|
||||||
|
|||||||
Reference in New Issue
Block a user