Lightweight ComfyUI wrapper for IndexTTS 2 (voice cloning + emotion control). Nodes call the upstream inference code so behaviour stays matched with the original repo.

Original repo: https://github.com/index-tts/index-tts

Updates

2025-09-22: Added IndexTTS2 Advanced node exposing sampling, speed, seed, and other generation controls.

Install

Clone this repository into ComfyUI/custom_nodes/

Inside your ComfyUI Python environment:

pip install wetext
pip install -r requirements.txt

Models

Create checkpoints/ in the repo root and copy the IndexTTS-2 release there (https://huggingface.co/IndexTeam/IndexTTS-2/tree/main). Missing files will be cached from Hugging Face automatically.

Nodes

IndexTTS2 Simple – speaker audio, text, optional emotion audio/vector; outputs audio + status string. Auto-selects device, FP16 on CUDA.
IndexTTS2 Advanced – Simple inputs plus overrides for sampling, speech speed, pauses, CFG, seed.
IndexTTS2 Emotion Vector – eight sliders (0.0–1.4, sum <= 1.5) producing an emotion vector.
IndexTTS2 Emotion From Text – requires ModelScope and local QwenEmotion; turns short text into an emotion vector + summary.

Examples

Speaker audio -> IndexTTS2 Simple -> Preview/Save Audio
Speaker + emotion audio -> IndexTTS2 Simple -> Save
Emotion Vector -> IndexTTS2 Simple -> Save
Emotion From Text -> IndexTTS2 Simple -> Save

Troubleshooting

Windows only so far; DeepSpeed is disabled.
Install wetext if the module is missing on first launch.
Emotion vector sum must stay <= 1.5.

README.md Unescape Escape

ComfyUI-IndexTTS2