Files
ComfyUI_frontend/apps/hub/knowledge/models/chatterbox.md
dante01yoon bbd0a6b201 feat: migrate workflow template site as apps/hub
Migrate workflow_templates/site into the frontend monorepo as apps/hub
so the hub can use @comfyorg/design-system and shared packages.

Changes to existing files:
- pnpm-workspace.yaml: add @astrojs/sitemap, @astrojs/vercel, lucide-vue-next
- eslint.config.ts: add hub ignores and i18n/import rule overrides
- .oxlintrc.json: add hub scripts to ignore patterns
- knip.config.ts: add hub workspace config

apps/hub adaptations from source:
- Replace local cn() with @comfyorg/tailwind-utils (19 files)
- Integrate @comfyorg/design-system/css/base.css in global.css
- Make TEMPLATES_DIR configurable via HUB_TEMPLATES_DIR env var
- Add HUB_SKIP_SYNC flag for builds without template data
- Remove Vite 8-incompatible rollupOptions.output.manualChunks
- Fix stylelint violations (modern color notation, number precision)
- Gitignore generated content (thumbnails, synced templates, AI cache)
2026-04-06 20:53:13 +09:00

1.8 KiB

Chatterbox

Chatterbox is a family of state-of-the-art open-source text-to-speech models developed by Resemble AI, featuring zero-shot voice cloning and emotion control.

Model Variants

Chatterbox Turbo

  • 350M parameters, single-step mel decoding for low latency
  • Paralinguistic tags for non-speech sounds ([laugh], [cough], [chuckle])
  • English only, optimized for voice agents and production use

Chatterbox (Original)

  • 500M parameter Llama backbone, English only
  • CFG and exaggeration control for emotion intensity

Chatterbox Multilingual

  • 500M parameters, 23 languages (Arabic, Chinese, French, German, Hindi, Japanese, Korean, Spanish, and more)
  • Zero-shot voice cloning across languages

Key Features

  • Zero-shot voice cloning from a few seconds of reference audio
  • Emotion exaggeration control (first open-source model with this feature)
  • Built-in PerTh neural watermarking for responsible AI
  • Sub-200ms latency for real-time applications
  • Trained on 500K hours of cleaned speech data
  • MIT license (free for commercial use)
  • Outperforms ElevenLabs in subjective evaluations

Hardware Requirements

  • Minimum: NVIDIA GPU with CUDA support
  • Turbo model requires less VRAM than original due to smaller architecture
  • Runs on consumer GPUs (RTX 3060 and above)
  • CPU inference possible but significantly slower

Common Use Cases

  • Voice cloning for content creation
  • AI voice agents and assistants
  • Audiobook narration
  • Game and media dialogue generation

Key Parameters

  • exaggeration: Emotion intensity control (0.0 to 1.0, default 0.5)
  • cfg_weight: Classifier-free guidance weight (0.0 to 1.0, default 0.5)
  • audio_prompt_path: Path to reference audio clip for voice cloning
  • language_id: Language code for multilingual model (e.g., "fr", "zh", "ja")