mirror of
https://github.com/Comfy-Org/ComfyUI_frontend.git
synced 2026-04-20 14:30:41 +00:00
Migrate workflow_templates/site into the frontend monorepo as apps/hub so the hub can use @comfyorg/design-system and shared packages. Changes to existing files: - pnpm-workspace.yaml: add @astrojs/sitemap, @astrojs/vercel, lucide-vue-next - eslint.config.ts: add hub ignores and i18n/import rule overrides - .oxlintrc.json: add hub scripts to ignore patterns - knip.config.ts: add hub workspace config apps/hub adaptations from source: - Replace local cn() with @comfyorg/tailwind-utils (19 files) - Integrate @comfyorg/design-system/css/base.css in global.css - Make TEMPLATES_DIR configurable via HUB_TEMPLATES_DIR env var - Add HUB_SKIP_SYNC flag for builds without template data - Remove Vite 8-incompatible rollupOptions.output.manualChunks - Fix stylelint violations (modern color notation, number precision) - Gitignore generated content (thumbnails, synced templates, AI cache)
1.8 KiB
1.8 KiB
Chatterbox
Chatterbox is a family of state-of-the-art open-source text-to-speech models developed by Resemble AI, featuring zero-shot voice cloning and emotion control.
Model Variants
Chatterbox Turbo
- 350M parameters, single-step mel decoding for low latency
- Paralinguistic tags for non-speech sounds ([laugh], [cough], [chuckle])
- English only, optimized for voice agents and production use
Chatterbox (Original)
- 500M parameter Llama backbone, English only
- CFG and exaggeration control for emotion intensity
Chatterbox Multilingual
- 500M parameters, 23 languages (Arabic, Chinese, French, German, Hindi, Japanese, Korean, Spanish, and more)
- Zero-shot voice cloning across languages
Key Features
- Zero-shot voice cloning from a few seconds of reference audio
- Emotion exaggeration control (first open-source model with this feature)
- Built-in PerTh neural watermarking for responsible AI
- Sub-200ms latency for real-time applications
- Trained on 500K hours of cleaned speech data
- MIT license (free for commercial use)
- Outperforms ElevenLabs in subjective evaluations
Hardware Requirements
- Minimum: NVIDIA GPU with CUDA support
- Turbo model requires less VRAM than original due to smaller architecture
- Runs on consumer GPUs (RTX 3060 and above)
- CPU inference possible but significantly slower
Common Use Cases
- Voice cloning for content creation
- AI voice agents and assistants
- Audiobook narration
- Game and media dialogue generation
Key Parameters
- exaggeration: Emotion intensity control (0.0 to 1.0, default 0.5)
- cfg_weight: Classifier-free guidance weight (0.0 to 1.0, default 0.5)
- audio_prompt_path: Path to reference audio clip for voice cloning
- language_id: Language code for multilingual model (e.g., "fr", "zh", "ja")