SillyTavern - Extras
What is this
A set of APIs for various SillyTavern extensions.
You need to run the latest version of my TavernAI fork. Grab it here: Direct link to ZIP, Git repository
All modules require at least 6 Gb of VRAM to run. With Stable Diffusion disabled, it will probably fit in 4 Gb. Alternatively, everything could also be run on the CPU.
Try on Colab (will give you a link to Extras API):
Colab link: https://colab.research.google.com/github/Cohee1207/SillyTavern/blob/main/colab/GPU.ipynb
How to run
❗ IMPORTANT!
Default requirements.txt contains only basic packages for text processing
If you want to use the most advanced features (like Stable Diffusion, TTS), change that to requirements-complete.txt in commands below. See Modules section for more details.
Getting an error when installing from requirements-complete.txt?
ERROR: Could not build wheels for hnswlib, which is required to install pyproject.toml-based projects
Installing chromadb package requires one of the following:
- Have Visual C++ build tools installed: https://visualstudio.microsoft.com/visual-cpp-build-tools/
- Installing hnswlib from conda:
conda install -c conda-forge hnswlib
Missing modules reported by SillyTavern extensions menu?
You must specify a list of module names to be run in the --enable-modules command (caption provided as an example). See Modules section.
☁️ Colab
- Open colab link
- Select desired "extra" options and start the cell
- Wait for it to finish
- Get an API URL link from colab output under the
### SillyTavern Extensions LINK ###title - Start SillyTavern with extensions support: set
enableExtensionstotruein config.conf - Navigate to SillyTavern extensions menu and put in an API URL and tap "Connect" to load the extensions
What about mobile/Android/Termux? 🤔
There are some folks in the community having success running Extras on their phones via Ubuntu on Termux. This project wasn't made with mobile support in mind, so this guide is provided strictly for your information only: https://rentry.org/STAI-Termux#downloading-and-running-tai-extras
❗ IMPORTANT!
I will not provide any support for running this on Android. Direct all your questions to the creator of this guide.
💻 Locally
Option 1 - Conda (recommended) 🐍
PREREQUISITES
- Install Miniconda: https://docs.conda.io/en/latest/miniconda.html
- (Important!) Read how to use Conda: https://conda.io/projects/conda/en/latest/user-guide/getting-started.html
- Install git: https://git-scm.com/downloads
EXECUTE THESE COMMANDS ONE BY ONE IN THE CONDA COMMAND PROMPT.
TYPE/PASTE EACH COMMAND INTO THE PROMPT, HIT ENTER AND WAIT FOR IT TO FINISH!
- Before the first run, create an environment (let's call it
extras):
conda create -n extras
- Now activate the newly created env
conda activate extras
- Install the required system packages
conda install pytorch=2.0.0 torchvision=0.15.0 torchaudio=2.0.0 pytorch-cuda=11.7 git -c pytorch -c nvidia
- Clone this repository
git clone https://github.com/Cohee1207/SillyTavern-extras
- Navigated to the freshly cloned repository
cd SillyTavern-extras
- Install the project requirements
pip install -r requirements.txt
- Run the Extensions API server
python server.py --enable-modules=caption,summarize,classify
- Copy the Extra's server API URL listed in the console window after it finishes loading up. On local installs, this defaults to
http://localhost:5100. - Open your SillyTavern config.conf file (located in the base install folder), and look for a line "
const enableExtensions". Make sure that line has "= true", and not "= false". - Start your SillyTavern server
- Open the Extensions panel (via the 'Stacked Blocks' icon at the top of the page), paste the API URL into the input box, and click "Connect" to connect to the Extras extension server.
- To run again, simply activate the environment and run these commands. Be sure to the additional options for server.py (see below) that your setup requires.
conda activate extras
python server.py
Option 2 - Vanilla 🍦
- Install Python 3.10: https://www.python.org/downloads/release/python-31010/
- Install git: https://git-scm.com/downloads
- Clone the repo:
git clone https://github.com/Cohee1207/SillyTavern-extras
cd SillyTavern-extras
- Run
python -m pip install -r requirements.txt - Run
python server.py --enable-modules=caption,summarize,classify - Get the API URL. Defaults to
http://localhost:5100if you run locally. - Start SillyTavern with extensions support: set
enableExtensionstotruein config.conf - Navigate to SillyTavern extensions menu and put in an API URL and tap "Connect" to load the extensions
Modules
| Name | Description | Included in default requirements.txt |
|---|---|---|
caption |
Image captioning | ✔️ Yes |
summarize |
Text summarization | ✔️ Yes |
classify |
Text sentiment classification | ✔️ Yes |
sd |
Stable Diffusion image generation | ❌ No (✔️ remote) |
silero-tts |
Silero TTS server | ❌ No |
edge-tts |
Microsoft Edge TTS client | ✔️ Yes |
chromadb |
Infinity context server | ❌ No |
Additional options
| Flag | Description |
|---|---|
--enable-modules |
Required option. Provide a list of enabled modules. Expects a comma-separated list of module names. See Modules Example: --enable-modules=caption,sd |
--port |
Specify the port on which the application is hosted. Default: 5100 |
--listen |
Host the app on the local network |
--share |
Share the app on CloudFlare tunnel |
--secure |
Adds API key authentication requirements. Highly recommended when paired with share! |
--cpu |
Run the models on the CPU instead of CUDA |
--summarization-model |
Load a custom summarization model. Expects a HuggingFace model ID. Default: Qiliang/bart-large-cnn-samsum-ChatGPT_v3 |
--classification-model |
Load a custom sentiment classification model. Expects a HuggingFace model ID. Default (6 emotions): nateraw/bert-base-uncased-emotion Other solid option is (28 emotions): joeddav/distilbert-base-uncased-go-emotions-student For Chinese language: touch20032003/xuyuan-trial-sentiment-bert-chinese |
--captioning-model |
Load a custom captioning model. Expects a HuggingFace model ID. Default: Salesforce/blip-image-captioning-large |
--keyphrase-model |
Load a custom key phrase extraction model. Expects a HuggingFace model ID. Default: ml6team/keyphrase-extraction-distilbert-inspec |
--prompt-model |
Load a custom prompt generation model. Expects a HuggingFace model ID. Default: FredZhang7/anime-anything-promptgen-v2 |
--embedding-model |
Load a custom text embedding model. Expects a HuggingFace model ID. Default: sentence-transformers/all-mpnet-base-v2 |
--chroma-host |
Specifies a host IP for a remote ChromaDB server. |
--chroma-port |
Specifies an HTTP port for a remote ChromaDB server. Default: 8000 |
--sd-model |
Load a custom Stable Diffusion image generation model. Expects a HuggingFace model ID. Default: ckpt/anything-v4.5-vae-swapped Must have VAE pre-baked in PyTorch format or the output will look drab! |
--sd-cpu |
Force the Stable Diffusion generation pipeline to run on the CPU. SLOW! |
--sd-remote |
Use a remote SD backend. Supported APIs: sd-webui |
--sd-remote-host |
Specify the host of the remote SD backend Default: 127.0.0.1 |
--sd-remote-port |
Specify the port of the remote SD backend Default: 7860 |
--sd-remote-ssl |
Use SSL for the remote SD backend Default: False |
--sd-remote-auth |
Specify the username:password for the remote SD backend (if required) |
ChromaDB
ChromaDB is a blazing fast and open source database that is used for long-term memory when chatting with characters. It can be run in-memory or on a local server on your LAN.
NOTE: You should NOT run ChromaDB on a cloud server. There are no methods for authentication (yet), so unless you want to expose an unauthenticated ChromaDB to the world, run this on a local server in your LAN.
In-memory setup
Run the extras server with the chromadb module enabled.
Remote setup
Prerequisites: Docker, Docker compose (make sure you're running in rootless mode with the systemd service enabled if on Linux)
Steps:
- Run
git clone https://github.com/chroma-core/chroma chromadbandcd chromadb - Run
docker-compose up -d --buildto build ChromaDB. This may take a long time depending on your system - Once the build process is finished, ChromaDB should be running in the background. You can check with the command
docker ps - On your client machine, specify your local server ip in the
--chroma-hostargument (ex.--chroma-host=192.168.1.10)
If you are running ChromaDB on the same machine as SillyTavern, you will have to change the port of one of the services. To do this for ChromaDB:
- Run
docker psto get the container ID and thendocker container stop <container ID> - Enter the ChromaDB git repository
cd chromadb - Open
docker-compose.ymland look for the line starting withuvicorn chromadb.app:app - Change the
--portargument to whatever port you want. - Look for the
portscategory and change the occurrences of8000to whatever port you chose in step 4. - Save and exit. Then run
docker-compose up --detach - On your client machine, make sure to specity the
--chroma-portargument (ex.--chroma-port=<your-port-here>) along with the--chroma-hostargument.
API Endpoints
Get active list
GET /api/modules
Input
None
Output
{"modules":["caption", "classify", "summarize"]}
Image captioning
POST /api/caption
Input
{ "image": "base64 encoded image" }
Output
{ "caption": "caption of the posted image" }
Text summarization
POST /api/summarize
Input
{ "text": "text to be summarize", "params": {} }
Output
{ "summary": "summarized text" }
Optional: params object for control over summarization:
| Name | Default value |
|---|---|
temperature |
1.0 |
repetition_penalty |
1.0 |
max_length |
500 |
min_length |
200 |
length_penalty |
1.5 |
bad_words |
["\n", '"', "*", "[", "]", "{", "}", ":", "(", ")", "<", ">"] |
Text sentiment classification
POST /api/classify
Input
{ "text": "text to classify sentiment of" }
Output
{
"classification": [
{
"label": "joy",
"score": 1.0
},
{
"label": "anger",
"score": 0.7
},
{
"label": "love",
"score": 0.6
},
{
"label": "sadness",
"score": 0.5
},
{
"label": "fear",
"score": 0.4
},
{
"label": "surprise",
"score": 0.3
}
]
}
NOTES
- Sorted by descending score order
- List of categories defined by the summarization model
- Value range from 0.0 to 1.0
Stable Diffusion image generation
POST /api/image
Input
{ "prompt": "prompt to be generated", "sampler": "DDIM", "steps": 20, "scale": 6, "model": "model_name" }
Output
{ "image": "base64 encoded image" }
NOTES
- Only the "prompt" parameter is required
- Both "sampler" and "model" parameters only work when using a remote SD backend
Get available Stable Diffusion models
GET /api/image/models
Output
{ "models": [list of all available model names] }
Get available Stable Diffusion samplers
GET /api/image/samplers
Output
{ "samplers": [list of all available sampler names] }
Get currently loaded Stable Diffusion model
GET /api/image/model
Output
{ "model": "name of the current loaded model" }
Load a Stable Diffusion model (remote)
POST /api/image/model
Input
{ "model": "name of the model to load" }
Output
{ "previous_model": "name of the previous model", "current_model": "name of the newly loaded model" }
Generate Silero TTS voice
POST /api/tts/generate
Input
{ "speaker": "speaker voice_id", "text": "text to narrate" }
Output
WAV audio file.
Get Silero TTS voices
GET /api/tts/speakers
Output
[
{
"name": "en_0",
"preview_url": "http://127.0.0.1:5100/api/tts/sample/en_0",
"voice_id": "en_0"
}
]
Get Silero TTS voice sample
GET /api/tts/sample/<voice_id>
Output
WAV audio file.
Add messages to chromadb
POST /api/chromadb
Input
{
"chat_id": "chat1 - 2023-12-31",
"messages": [
{
"id": "633a4bd1-8350-46b5-9ef2-f5d27acdecb7",
"date": 1684164339877,
"role": "user",
"content": "Hello, AI world!",
"meta": "this is meta"
},
{
"id": "8a2ed36b-c212-4a1b-84a3-0ffbe0896506",
"date": 1684164411759,
"role": "assistant",
"content": "Hello, Hooman!"
},
]
}
Output
{ "count": 2 }
Query chromadb
POST /api/chromadb/query
Input
{
"chat_id": "chat1 - 2023-12-31",
"query": "Hello",
"n_results": 2,
}
Output
[
{
"id": "633a4bd1-8350-46b5-9ef2-f5d27acdecb7",
"date": 1684164339877,
"role": "user",
"content": "Hello, AI world!",
"distance": 0.31,
"meta": "this is meta"
},
{
"id": "8a2ed36b-c212-4a1b-84a3-0ffbe0896506",
"date": 1684164411759,
"role": "assistant",
"content": "Hello, Hooman!",
"distance": 0.29
},
]
Delete the messages from chromadb
POST /api/chromadb/purge
Input
{ "chat_id": "chat1 - 2023-04-12" }
Get a list of Edge TTS voices
GET /api/edge-tts/list
Output
[{'Name': 'Microsoft Server Speech Text to Speech Voice (af-ZA, AdriNeural)', 'ShortName': 'af-ZA-AdriNeural', 'Gender': 'Female', 'Locale': 'af-ZA', 'SuggestedCodec': 'audio-24khz-48kbitrate-mono-mp3', 'FriendlyName': 'Microsoft Adri Online (Natural) - Afrikaans (South Africa)', 'Status': 'GA', 'VoiceTag': {'ContentCategories': ['General'], 'VoicePersonalities': ['Friendly', 'Positive']}}]
Generate Edge TTS voice
POST /api/edge-tts/generate
Input
{ "text": "Text to narrate", "voice": "af-ZA-AdriNeural" }
Output
MP3 audio file.