mirror of https://github.com/SillyTavern/SillyTavern-Extras.git synced 2026-01-26 17:20:04 +00:00

Go to file

Juha Jeronen 4c6f843ff9 Talkinghead performance improvements and refactoring (#207 )

* talkinghead: fix and improve THA3 manual poser

* server.py: no, don't yet use fp16 for talkinghead

* talkinghead: remove wxPython dependency from live mode

* comment out unused functions

* add TODO list

* coding style

* remove unused import

* add TODO marker

* message wordings

* fix typos in variable names

* talkinghead updates

* talkinghead updates

* Empty commit

* presentation order, sectioning

* fix the inMotion flag update

* mark a TODO

* refactor

* remove done TODO items

* mark a TODO

* comment wording

* pause animation while loading a new image

* parser doesn't belong here, not a command-line app anymore

* message wording

* use finally

* remove superfluous "global" declarations

* lots of cleanup

* remove silly sys.path entry

* improve docstring

* oops

* app.py now only serves the live mode for the plugin

* talkinghead live mode: remove ifacialmocap stuff (unused)

* improve comment

* list walking is so 1990s

* use double quotes consistently

* remove now-unused ifacialmocap-related files from the repo

* remove done TODO item

* improve docstring

* update comment

* remove now-unused function

* update comment

* improve docstring

* add TODO marker

* oops, typo

* add --talkinghead-model command-line option to server.py

Default is 'auto': float16 on GPU, float32 on CPU.

* talkinghead: auto-install THA3 models if needed

* remove tha3/models from git repo (have autodownload now)

* Add hf-hub as explicit dependency

* Add THA models to gitignore

---------

Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>

2023-12-21 23:48:25 +02:00

data

Change text of placeholder in data/tmp

2023-08-17 01:44:58 +02:00

docker

Merge pull request #156 from yggdrasil75/patch-1

2023-11-17 18:10:20 +02:00

modules

Add links to search results

2023-12-11 23:52:25 +02:00

talkinghead

Talkinghead performance improvements and refactoring (#207 )

2023-12-21 23:48:25 +02:00

.editorconfig

Add editorconfig

2023-06-04 12:40:36 +03:00

.gitignore

Talkinghead performance improvements and refactoring (#207 )

2023-12-21 23:48:25 +02:00

api_key.txt

Create api_key.txt

2023-09-17 08:17:30 -04:00

constants.py

update docker setting

2023-08-29 11:48:04 +08:00

LICENSE

Create LICENSE

2023-11-20 15:41:40 +02:00

modules.txt

added start.sh script for linux environments (#176 )

2023-12-07 16:44:34 +02:00

README.md

Add links to search results

2023-12-11 23:52:25 +02:00

requirements-coqui.txt

#183 Rework requirement files

2023-11-16 14:11:16 +02:00

requirements-rocm.txt

Talkinghead performance improvements and refactoring (#207 )

2023-12-21 23:48:25 +02:00

requirements-rvc.txt

#183 Rework requirement files

2023-11-16 14:11:16 +02:00

requirements-silicon.txt

Talkinghead performance improvements and refactoring (#207 )

2023-12-21 23:48:25 +02:00

requirements.txt

Talkinghead performance improvements and refactoring (#207 )

2023-12-21 23:48:25 +02:00

server.py

Talkinghead performance improvements and refactoring (#207 )

2023-12-21 23:48:25 +02:00

start.sh

added start.sh script for linux environments (#176 )

2023-12-07 16:44:34 +02:00

tts_edge.py

Add rate to Edge TTS

2023-06-10 15:37:37 +03:00

README.md

SillyTavern - Extras

Recent news

November 20 2023 - The project is relicensed as AGPLv3 to comply with the rest of ST organization policy. If you have any concerns about that, please raise a discussion in the appropriate channel.
November 16 2023 - Requirement files were remade from scratch to simplify the process of local installation.
- Removed requirements-complete.txt, please use requirements.txt instead.
- Unlocked versions of all requirements unless strictly necessary.
- Coqui TTS requirements moved to requirements-coqui.txt.
July 25 2023 - Now extras require Python 3.11 to run, some of the modules new will be incompatible with old Python 3.10 installs. To migrate using conda, please remove old environment using conda remove --name extras --all and reinstall using the instructions below.

What is this

A set of APIs for various SillyTavern extensions.

You need to run the latest version of SillyTavern. Grab it here: How to install, Git repository

All modules, except for Stable Diffusion, run on the CPU by default. However, they can alternatively be configured to use CUDA (with --cuda command line option). When running all modules simultaneously, you can expect a usage of approximately 6 GB of RAM. Loading Stable Diffusion adds an additional couple of GB to the memory usage.

Try on Colab (will give you a link to Extras API):

Colab link: https://colab.research.google.com/github/SillyTavern/SillyTavern/blob/release/colab/GPU.ipynb

Documentation: https://docs.sillytavern.app/

How to run

❗ IMPORTANT! Requirement files explained

Default requirements.txt installs PyTorch CUDA by default.
If you run on AMD GPU, use requirements-rocm.txt file instead.
If you run on Apple Silicon (ARM series), use the requirements-silicon.txt file instead.
If you want to use Coqui TTS, install requirements-coqui.txt after choosing the requirements from the list above.
If you want to use RVC, install requirements-rvc.txt after choosing the requirements from the list above.
BE WARNED THAT:
- Coqui package is extremely unstable and may break other packages or not work at all in your environment.
- It's not really worth it.

Common errors when installing requirements

ERROR: Could not build wheels for hnswlib, which is required to install pyproject.toml-based projects

Installing chromadb package requires one of the following:

Have Visual C++ build tools installed: https://visualstudio.microsoft.com/visual-cpp-build-tools/
Installing hnswlib from conda: conda install -c conda-forge hnswlib

You must specify a list of module names to be run in the --enable-modules command (caption provided as an example). See Modules section.

☁️ Colab

Open colab link
Select desired "extra" options and start the cell
Wait for it to finish
Get an API URL link from colab output under the ### SillyTavern Extensions LINK ### title
Start SillyTavern with extensions support: set enableExtensions to true in config.conf
Navigate to SillyTavern extensions menu and put in an API URL and tap "Connect" to load the extensions

What about mobile/Android/Termux? 🤔

There are some folks in the community having success running Extras on their phones via Ubuntu on Termux. This project wasn't made with mobile support in mind, so this guide is provided strictly for your information only: https://rentry.org/STAI-Termux#downloading-and-running-tai-extras

❗ IMPORTANT!

We will NOT provide any support for running this on Android. Direct all your questions to the creator of this guide.

Talkinghead module on Linux

It requires the installation of an additional package because it's not installed automatically due to incompatibility with Colab. Run this after you install other requirements:

pip install wxpython==4.2.1

💻 Locally

Option 1 - Conda (recommended) 🐍

PREREQUISITES

Install Miniconda: https://docs.conda.io/en/latest/miniconda.html
(Important!) Read how to use Conda: https://conda.io/projects/conda/en/latest/user-guide/getting-started.html
Install git: https://git-scm.com/downloads

EXECUTE THESE COMMANDS ONE BY ONE IN THE CONDA COMMAND PROMPT.

TYPE/PASTE EACH COMMAND INTO THE PROMPT, HIT ENTER AND WAIT FOR IT TO FINISH!

Before the first run, create an environment (let's call it extras):

conda create -n extras

Now activate the newly created env

conda activate extras

Install Python 3.11

conda install python=3.11

Install the required system packages

conda install git

Clone this repository

git clone https://github.com/SillyTavern/SillyTavern-extras

Navigated to the freshly cloned repository

cd SillyTavern-extras

Install the project requirements

pip install -r requirements.txt

Run the Extensions API server

python server.py --enable-modules=caption,summarize,classify

Copy the Extra's server API URL listed in the console window after it finishes loading up. On local installs, this defaults to http://localhost:5100.
Open your SillyTavern config.conf file (located in the base install folder), and look for a line "const enableExtensions". Make sure that line has "= true", and not "= false".
Start your SillyTavern server
Open the Extensions panel (via the 'Stacked Blocks' icon at the top of the page), paste the API URL into the input box, and click "Connect" to connect to the Extras extension server.
To run again, simply activate the environment and run these commands. Be sure to the additional options for server.py (see below) that your setup requires.

conda activate extras
python server.py

Option 2 - Vanilla 🍦

Install Python 3.11: https://www.python.org/downloads/release/python-3114/
Install git: https://git-scm.com/downloads
Clone the repo:

git clone https://github.com/SillyTavern/SillyTavern-extras
cd SillyTavern-extras

Run python -m pip install -r requirements.txt
Run python server.py --enable-modules=caption,summarize,classify
Get the API URL. Defaults to http://localhost:5100 if you run locally.
Start SillyTavern with extensions support: set enableExtensions to true in config.conf
Navigate to the SillyTavern extensions menu and put in an API URL and tap "Connect" to load the extensions

Modules

Name	Description
`caption`	Image captioning
`summarize`	Text summarization
`classify`	Text sentiment classification
`sd`	Stable Diffusion image generation (remote A1111 server by default)
`silero-tts`	Silero TTS server
`chromadb`	Vector storage server
`talkinghead`	Talking Head Sprites
`edge-tts`	Microsoft Edge TTS client
`coqui-tts`	Coqui TTS server
`rvc`	Real-time voice cloning
`websearch`	Google search using Selenium headless browser

Additional options

Flag	Description
`--enable-modules`	Required option. Provide a list of enabled modules. Expects a comma-separated list of module names. See Modules Example: `--enable-modules=caption,sd`
`--port`	Specify the port on which the application is hosted. Default: 5100
`--listen`	Host the app on the local network
`--share`	Share the app on CloudFlare tunnel
`--secure`	Adds API key authentication requirements. Highly recommended when paired with share!
`--cpu`	Run the models on the CPU instead of CUDA. Enabled by default.
`--mps` or `--m1`	Run the models on Apple Silicon. Only for M1 and M2 processors.
`--cuda`	Uses CUDA (GPU+VRAM) to run modules if it is available. Otherwise, falls back to using CPU.
`--cuda-device`	Specifies a CUDA device to use. Defaults to `cuda:0` (first available GPU).
`--talkinghead-gpu`	Uses GPU for talkinghead (10x FPS increase in animation).
`--coqui-gpu`	Uses GPU for coqui TTS (if available).
`--coqui-model`	If provided, downloads and preloads a coqui TTS model. Default: none. Example: `tts_models/multilingual/multi-dataset/bark`
`--summarization-model`	Load a custom summarization model. Expects a HuggingFace model ID. Default: Qiliang/bart-large-cnn-samsum-ChatGPT_v3
`--classification-model`	Load a custom sentiment classification model. Expects a HuggingFace model ID. Default (6 emotions): nateraw/bert-base-uncased-emotion Other solid option is (28 emotions): joeddav/distilbert-base-uncased-go-emotions-student For Chinese language: touch20032003/xuyuan-trial-sentiment-bert-chinese
`--captioning-model`	Load a custom captioning model. Expects a HuggingFace model ID. Default: Salesforce/blip-image-captioning-large
`--embedding-model`	Load a custom text embedding model. Expects a HuggingFace model ID. Default: sentence-transformers/all-mpnet-base-v2
`--chroma-host`	Specifies a host IP for a remote ChromaDB server.
`--chroma-port`	Specifies an HTTP port for a remote ChromaDB server. Default: `8000`
`--sd-model`	Load a custom Stable Diffusion image generation model. Expects a HuggingFace model ID. Default: ckpt/anything-v4.5-vae-swapped Must have VAE pre-baked in PyTorch format or the output will look drab!
`--sd-cpu`	Force the Stable Diffusion generation pipeline to run on the CPU. SLOW!
`--sd-remote`	Use a remote SD backend. Supported APIs: sd-webui
`--sd-remote-host`	Specify the host of the remote SD backend Default: 127.0.0.1
`--sd-remote-port`	Specify the port of the remote SD backend Default: 7860
`--sd-remote-ssl`	Use SSL for the remote SD backend Default: False
`--sd-remote-auth`	Specify the `username:password` for the remote SD backend (if required)

Coqui TTS

Running on Mac M1

ImportError: symbol not found

If you're getting the following error when running coqui-tts module on M1 Mac:

ImportError: dlopen(/Users/user/.../lib/python3.11/site-packages/MeCab/_MeCab.cpython-311-darwin.so, 0x0002): symbol not found in flat namespace '__ZN5MeCab11createModelEPKc'

Do the following:

Install homebrew: https://brew.sh/
Build and install the mecab package

brew install --build-from-source mecab
ARCHFLAGS='-arch arm64' pip install --no-binary :all: --compile --use-pep517 --no-cache-dir --force mecab-python3

ChromaDB

ChromaDB is a blazing fast and open source database that is used for long-term memory when chatting with characters. It can be run in-memory or on a local server on your LAN.

NOTE: You should NOT run ChromaDB on a cloud server. There are no methods for authentication (yet), so unless you want to expose an unauthenticated ChromaDB to the world, run this on a local server in your LAN.

In-memory setup

Run the extras server with the chromadb module enabled (recommended).

Remote setup

Use this if you want to use ChromaDB with docker or host it remotely. If you don't know what that means and only want to use ChromaDB with ST on your local device, use the 'in-memory' instructions instead.

Prerequisites: Docker, Docker compose (make sure you're running in rootless mode with the systemd service enabled if on Linux).

Steps:

Run git clone https://github.com/chroma-core/chroma chromadb and cd chromadb
Run docker-compose up -d --build to build ChromaDB. This may take a long time depending on your system
Once the build process is finished, ChromaDB should be running in the background. You can check with the command docker ps
On your client machine, specify your local server ip in the --chroma-host argument (ex. --chroma-host=192.168.1.10)

If you are running ChromaDB on the same machine as SillyTavern, you will have to change the port of one of the services. To do this for ChromaDB:

Run docker ps to get the container ID and then docker container stop <container ID>
Enter the ChromaDB git repository cd chromadb
Open docker-compose.yml and look for the line starting with uvicorn chromadb.app:app
Change the --port argument to whatever port you want.
Look for the ports category and change the occurrences of 8000 to whatever port you chose in step 4.
Save and exit. Then run docker-compose up --detach
On your client machine, make sure to specity the --chroma-port argument (ex. --chroma-port=<your-port-here>) along with the --chroma-host argument.

API Endpoints

Get active list

GET /api/modules

Input

None

Output

{"modules":["caption", "classify", "summarize"]}

Image captioning

POST /api/caption

Input

{ "image": "base64 encoded image" }

Output

{ "caption": "caption of the posted image" }

Text summarization

POST /api/summarize

Input

{ "text": "text to be summarize", "params": {} }

Output

{ "summary": "summarized text" }

Optional: `params` object for control over summarization:

Name	Default value
`temperature`	1.0
`repetition_penalty`	1.0
`max_length`	500
`min_length`	200
`length_penalty`	1.5
`bad_words`	["\n", '"', "*", "[", "]", "{", "}", ":", "(", ")", "<", ">"]

Text sentiment classification

POST /api/classify

Input

{ "text": "text to classify sentiment of" }

Output

{
    "classification": [
        {
            "label": "joy",
            "score": 1.0
        },
        {
            "label": "anger",
            "score": 0.7
        },
        {
            "label": "love",
            "score": 0.6
        },
        {
            "label": "sadness",
            "score": 0.5
        },
        {
            "label": "fear",
            "score": 0.4
        },
        {
            "label": "surprise",
            "score": 0.3
        }
    ]
}

NOTES

Sorted by descending score order

List of categories defined by the summarization model

Value range from 0.0 to 1.0

Stable Diffusion image generation

POST /api/image

Input

{ "prompt": "prompt to be generated", "sampler": "DDIM", "steps": 20, "scale": 6, "model": "model_name" }

Output

{ "image": "base64 encoded image" }

NOTES

Only the "prompt" parameter is required

Both "sampler" and "model" parameters only work when using a remote SD backend

Get available Stable Diffusion models

GET /api/image/models

Output

{ "models": [list of all available model names] }

Get available Stable Diffusion samplers

GET /api/image/samplers

Output

{ "samplers": [list of all available sampler names] }

Get currently loaded Stable Diffusion model

GET /api/image/model

Output

{ "model": "name of the current loaded model" }

Load a Stable Diffusion model (remote)

POST /api/image/model

Input

{ "model": "name of the model to load" }

Output

{ "previous_model": "name of the previous model", "current_model": "name of the newly loaded model" }

Generate Silero TTS voice

POST /api/tts/generate

Input

{ "speaker": "speaker voice_id", "text": "text to narrate" }

Output

WAV audio file.

Get Silero TTS voices

GET /api/tts/speakers

Output

[
    {
        "name": "en_0",
        "preview_url": "http://127.0.0.1:5100/api/tts/sample/en_0",
        "voice_id": "en_0"
    }
]

Get Silero TTS voice sample

GET /api/tts/sample/<voice_id>

Output

WAV audio file.

Add messages to chromadb

POST /api/chromadb

Input

{
    "chat_id": "chat1 - 2023-12-31",
    "messages": [
        {
            "id": "633a4bd1-8350-46b5-9ef2-f5d27acdecb7",
            "date": 1684164339877,
            "role": "user",
            "content": "Hello, AI world!",
            "meta": "this is meta"
        },
        {
            "id": "8a2ed36b-c212-4a1b-84a3-0ffbe0896506",
            "date": 1684164411759,
            "role": "assistant",
            "content": "Hello, Hooman!"
        },
    ]
}

Output

{ "count": 2 }

Query chromadb

POST /api/chromadb/query

Input

{
    "chat_id": "chat1 - 2023-12-31",
    "query": "Hello",
    "n_results": 2,
}

Output

[
    {
        "id": "633a4bd1-8350-46b5-9ef2-f5d27acdecb7",
        "date": 1684164339877,
        "role": "user",
        "content": "Hello, AI world!",
        "distance": 0.31,
        "meta": "this is meta"
    },
    {
        "id": "8a2ed36b-c212-4a1b-84a3-0ffbe0896506",
        "date": 1684164411759,
        "role": "assistant",
        "content": "Hello, Hooman!",
        "distance": 0.29
    },
]

Delete the messages from chromadb

POST /api/chromadb/purge

Input

{ "chat_id": "chat1 - 2023-04-12" }

Get a list of Edge TTS voices

GET /api/edge-tts/list

Output

[{'Name': 'Microsoft Server Speech Text to Speech Voice (af-ZA, AdriNeural)', 'ShortName': 'af-ZA-AdriNeural', 'Gender': 'Female', 'Locale': 'af-ZA', 'SuggestedCodec': 'audio-24khz-48kbitrate-mono-mp3', 'FriendlyName': 'Microsoft Adri Online (Natural) - Afrikaans (South Africa)', 'Status': 'GA', 'VoiceTag': {'ContentCategories': ['General'], 'VoicePersonalities': ['Friendly', 'Positive']}}]

Generate Edge TTS voice

POST /api/edge-tts/generate

Input

{ "text": "Text to narrate", "voice": "af-ZA-AdriNeural", "rate": 0 }

Output

MP3 audio file.

Load a Coqui TTS model

GET /api/coqui-tts/load

Input

_model (string, required): The name of the Coqui TTS model to load. _gpu (string, Optional): Use the GPU to load model. _progress (string, Optional): Show progress bar in terminal.

{ "_model": "tts_models--en--jenny--jenny\model.pth" }
{ "_gpu": "False" }
{ "_progress": "True" }

Output

"Loaded"

Get a list of Coqui TTS voices

GET /api/coqui-tts/list

Output

["tts_models--en--jenny--jenny\\model.pth", "tts_models--en--ljspeech--fast_pitch\\model_file.pth", "tts_models--en--ljspeech--glow-tts\\model_file.pth", "tts_models--en--ljspeech--neural_hmm\\model_file.pth", "tts_models--en--ljspeech--speedy-speech\\model_file.pth", "tts_models--en--ljspeech--tacotron2-DDC\\model_file.pth", "tts_models--en--ljspeech--vits\\model_file.pth", "tts_models--en--ljspeech--vits--neon\\model_file.pth.tar", "tts_models--en--multi-dataset--tortoise-v2", "tts_models--en--vctk--vits\\model_file.pth", "tts_models--et--cv--vits\\model_file.pth.tar", "tts_models--multilingual--multi-dataset--bark", "tts_models--multilingual--multi-dataset--your_tts\\model_file.pth", "tts_models--multilingual--multi-dataset--your_tts\\model_se.pth"]

Get a list of the loaded Coqui model speakers

GET /api/coqui-tts/multspeaker

Output

{"0": "female-en-5", "1": "female-en-5\n", "2": "female-pt-4\n", "3": "male-en-2", "4": "male-en-2\n", "5": "male-pt-3\n"}

Get a list of the loaded Coqui model lanagauges

GET /api/coqui-tts/multlang

Output

{"0": "en", "1": "fr-fr", "2": "pt-br"}

Generate Coqui TTS voice

POST /api/edge-tts/generate

Input

{
  "text": "Text to narrate",
  "speaker_id": "0",
  "mspker": null,
  "language_id": null,
  "style_wav": null
}

Output

MP3 audio file.

Loads a talkinghead character by specifying the character's image URL.

GET /api/talkinghead/load

Parameters

loadchar (string, required): The URL of the character's image. The URL should point to a PNG image. { "loadchar": "http://localhost:8000/characters/Aqua.png" }

Example

'http://localhost:5100/api/talkinghead/load?loadchar=http://localhost:8000/characters/Aqua.png'

Output

'OK'

Animates the talkinghead sprite to start talking.

GET /api/talkinghead/start_talking

Example

'http://localhost:5100/api/talkinghead/start_talking'

Output

"started"

Animates the talkinghead sprite to stop talking.

GET /api/talkinghead/stop_talking

Example

'http://localhost:5100/api/talkinghead/stop_talking'

Output

"stopped"

Outputs the animated talkinghead sprite.

GET /api/talkinghead/result_feed

Output

Animated transparent image

Perform web search

POST /api/websearch Available engines: google (default), duckduckgo

Input

{ "query": "what is beauty?", "engine": "google" }

Output

{ "results": "that would fall within the purview of your conundrums of philosophy", "links": ["http://example.com"] }

README.md

SillyTavern - Extras

Recent news

What is this

How to run

❗ IMPORTANT! Requirement files explained

Common errors when installing requirements

Missing modules reported by SillyTavern extensions menu?

☁️ Colab

What about mobile/Android/Termux? 🤔

❗ IMPORTANT!

Talkinghead module on Linux

💻 Locally

Option 1 - Conda (recommended) 🐍

Option 2 - Vanilla 🍦

Modules

Additional options

Coqui TTS

Running on Mac M1

ImportError: symbol not found

ChromaDB

In-memory setup

Remote setup

API Endpoints

Get active list

Input

Output

Image captioning

Input

Output

Text summarization

Input

Output

Optional: params object for control over summarization:

Text sentiment classification

Input

Output

Stable Diffusion image generation

Input

Output

Get available Stable Diffusion models

Output

Get available Stable Diffusion samplers

Output

Get currently loaded Stable Diffusion model

Output

Load a Stable Diffusion model (remote)

Input

Output

Generate Silero TTS voice

Input

Output

Get Silero TTS voices

Output

Get Silero TTS voice sample

Output

Add messages to chromadb

Input

Output

Query chromadb

Input

Output

Delete the messages from chromadb

Input

Get a list of Edge TTS voices

Output

Generate Edge TTS voice

Input

Output

Load a Coqui TTS model

Input

Output

Get a list of Coqui TTS voices

Output

Get a list of the loaded Coqui model speakers

Output

Get a list of the loaded Coqui model lanagauges

Output

Generate Coqui TTS voice

Input

Optional: `params` object for control over summarization: