mirror of https://github.com/Comfy-Org/ComfyUI_frontend.git synced 2026-04-20 14:30:41 +00:00

Files

dante01yoon bbd0a6b201 feat: migrate workflow template site as apps/hub

Migrate workflow_templates/site into the frontend monorepo as apps/hub
so the hub can use @comfyorg/design-system and shared packages.

Changes to existing files:
- pnpm-workspace.yaml: add @astrojs/sitemap, @astrojs/vercel, lucide-vue-next
- eslint.config.ts: add hub ignores and i18n/import rule overrides
- .oxlintrc.json: add hub scripts to ignore patterns
- knip.config.ts: add hub workspace config

apps/hub adaptations from source:
- Replace local cn() with @comfyorg/tailwind-utils (19 files)
- Integrate @comfyorg/design-system/css/base.css in global.css
- Make TEMPLATES_DIR configurable via HUB_TEMPLATES_DIR env var
- Add HUB_SKIP_SYNC flag for builds without template data
- Remove Vite 8-incompatible rollupOptions.output.manualChunks
- Fix stylelint violations (modern color notation, number precision)
- Gitignore generated content (thumbnails, synced templates, AI cache)

2026-04-06 20:53:13 +09:00

1.8 KiB

Raw Blame History

Chatterbox

Chatterbox is a family of state-of-the-art open-source text-to-speech models developed by Resemble AI, featuring zero-shot voice cloning and emotion control.

Model Variants

Chatterbox Turbo

350M parameters, single-step mel decoding for low latency
Paralinguistic tags for non-speech sounds ([laugh], [cough], [chuckle])
English only, optimized for voice agents and production use

Chatterbox (Original)

500M parameter Llama backbone, English only
CFG and exaggeration control for emotion intensity

Chatterbox Multilingual

500M parameters, 23 languages (Arabic, Chinese, French, German, Hindi, Japanese, Korean, Spanish, and more)
Zero-shot voice cloning across languages

Key Features

Zero-shot voice cloning from a few seconds of reference audio
Emotion exaggeration control (first open-source model with this feature)
Built-in PerTh neural watermarking for responsible AI
Sub-200ms latency for real-time applications
Trained on 500K hours of cleaned speech data
MIT license (free for commercial use)
Outperforms ElevenLabs in subjective evaluations

Hardware Requirements

Minimum: NVIDIA GPU with CUDA support
Turbo model requires less VRAM than original due to smaller architecture
Runs on consumer GPUs (RTX 3060 and above)
CPU inference possible but significantly slower

Common Use Cases

Voice cloning for content creation
AI voice agents and assistants
Audiobook narration
Game and media dialogue generation

Key Parameters

exaggeration: Emotion intensity control (0.0 to 1.0, default 0.5)
cfg_weight: Classifier-free guidance weight (0.0 to 1.0, default 0.5)
audio_prompt_path: Path to reference audio clip for voice cloning
language_id: Language code for multilingual model (e.g., "fr", "zh", "ja")

1.8 KiB Raw Blame History