From 21c25fd806dc0fcbe7b1774847057474c58d374e Mon Sep 17 00:00:00 2001 From: kingbri Date: Wed, 6 Dec 2023 00:21:52 -0500 Subject: [PATCH] Update README Signed-off-by: kingbri --- README.md | 100 ++++++++++++++++++++++++++++++++---------------------- 1 file changed, 60 insertions(+), 40 deletions(-) diff --git a/README.md b/README.md index eeb4f8e..f8f2235 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,10 @@ # TabbyAPI -A FastAPI based application that allows for generating text using an LLM (large language model) using the [exllamav2 backend](https://github.com/turboderp/exllamav2). +> [!NOTE] +> +> Need help? Join the [Discord Server](https://discord.gg/sYQxnuD7Fj) and get the `Tabby` role. Please be nice when asking questions. + +A FastAPI based application that allows for generating text using an LLM (large language model) using the https://github.com/turboderp/exllamav2. ## Disclaimer @@ -29,38 +33,28 @@ NOTE: For Flash Attention 2 to work on Windows, CUDA 12.x **must** be installed! 2. Navigate to the project directory: `cd tabbyAPI` 3. Create a python environment: - + 1. Through venv (recommended) - + 1. `python -m venv venv` - + 2. On Windows (Using powershell or Windows terminal): `.\venv\Scripts\activate`. On Linux: `source venv/bin/activate` - + 2. Through conda - + 1. `conda create -n tabbyAPI python=3.11` - + 2. `conda activate tabbyAPI` -4. Install torch using the instructions found [here](https://pytorch.org/get-started/locally/) - -5. Install exllamav2 (must be v0.0.9 or greater!) +4. Install the requirements file based on your system: - NOTE: TabbyAPI will give you a warning if a sampler isn't found due to the exllamav2 version being too low. - - 1. From a [wheel/release](https://github.com/turboderp/exllamav2#method-2-install-from-release-with-prebuilt-extension) (Recommended) - - 1. Find the version that corresponds with your cuda and python version. For example, a wheel with `cu121` and `cp311` corresponds to CUDA 12.1 and python 3.11 + 1. Cuda 12.x: `pip install -r requirements.txt` - 2. From [pip](https://github.com/turboderp/exllamav2#method-3-install-from-pypi): `pip install exllamav2` - - 1. This is a JIT compiled extension, which means that the initial launch of tabbyAPI will take some time. The build may also not work due to improper environment configuration. + 2. Cuda 11.8: `pip install -r requirements-cu118.txt` - 3. From [source](https://github.com/turboderp/exllamav2#method-1-install-from-source) + 3. ROCm 5.6: `pip install -r requirements-amd.txt` -6. Install the other requirements via: `pip install -r requirements.txt` - -7. If you want the `/v1/chat/completions` endpoint to work with a list of messages, install fastchat by running `pip install fschat[model_worker]` +5. If you want the `/v1/chat/completions` endpoint to work with a list of messages, install fastchat by running `pip install fschat[model_worker]` ## Configuration @@ -74,6 +68,36 @@ If you do want a config file, copy over `config_sample.yml` to `config.yml`. All 2. Run the tabbyAPI application: `python main.py` +## Updating + +To update tabbyAPI, just run `pip install --upgrade -r requirements.txt` using the `requirements.txt` for your configuration (ex. CUDA 11.8 or ROCm 5.6) + +### Update Exllamav2 + +> [!WARNING] +> +> These instructions are meant for advanced users. + +If the version of exllamav2 doesn't meet your specifications, you can install the dependency from various sources. + +NOTE: + +- TabbyAPI will print a warning if a sampler isn't found due to the exllamav2 version being too low. + +- Any upgrades using a requirements file will result in overwriting your installed wheel. To fix this, change `requirements.txt` locally, create an issue or PR, or install your version of exllamav2 after upgrades. + +Here are ways to install exllamav2: + +1. From a [wheel/release](https://github.com/turboderp/exllamav2#method-2-install-from-release-with-prebuilt-extension) (Recommended) + + 1. Find the version that corresponds with your cuda and python version. For example, a wheel with `cu121` and `cp311` corresponds to CUDA 12.1 and python 3.11 + +2. From [pip](https://github.com/turboderp/exllamav2#method-3-install-from-pypi): `pip install exllamav2` + + 1. This is a JIT compiled extension, which means that the initial launch of tabbyAPI will take some time. The build may also not work due to improper environment configuration. + +3. From [source](https://github.com/turboderp/exllamav2#method-1-install-from-source) + ## API Documentation Docs can be accessed once you launch the API at `http://:/docs` @@ -105,22 +129,22 @@ All routes require an API key except for the following which require an **admin* ## Common Issues - AMD cards will error out with flash attention installed, even if the config option is set to False. Run `pip uninstall flash_attn` to remove the wheel from your system. - - - See [#5](https://github.com/theroyallab/tabbyAPI/issues/5) + + - See [#5](https://github.com/theroyallab/tabbyAPI/issues/5) - Exllamav2 may error with the following exception: `ImportError: DLL load failed while importing exllamav2_ext: The specified module could not be found.` - - - First, make sure to check if the wheel is equivalent to your python version and CUDA version. Also make sure you're in a venv or conda environment. - - - If those prerequisites are correct, the torch cache may need to be cleared. This is due to a mismatching exllamav2_ext. - - - In Windows: Find the cache at `C:\Users\\AppData\Local\torch_extensions\torch_extensions\Cache` where `` is your Windows username - - - In Linux: Find the cache at `~/.cache/torch_extensions` - - - look for any folder named `exllamav2_ext` in the python subdirectories and delete them. - - - Restart TabbyAPI and launching should work again. + + - First, make sure to check if the wheel is equivalent to your python version and CUDA version. Also make sure you're in a venv or conda environment. + + - If those prerequisites are correct, the torch cache may need to be cleared. This is due to a mismatching exllamav2_ext. + + - In Windows: Find the cache at `C:\Users\\AppData\Local\torch_extensions\torch_extensions\Cache` where `` is your Windows username + + - In Linux: Find the cache at `~/.cache/torch_extensions` + + - look for any folder named `exllamav2_ext` in the python subdirectories and delete them. + + - Restart TabbyAPI and launching should work again. ## Contributing @@ -134,10 +158,6 @@ If you have a Pull Request - Describe the pull request in detail, what, and why you are changing something -## Support - -Need help? Join the [Discord Server](https://discord.gg/sYQxnuD7Fj) and get the `Tabby` role. Please be nice when asking questions. - ## Developers and Permissions Creators/Developers: