Files
ComfyUI_frontend/apps/hub/knowledge/models/ace-step.md
dante01yoon bbd0a6b201 feat: migrate workflow template site as apps/hub
Migrate workflow_templates/site into the frontend monorepo as apps/hub
so the hub can use @comfyorg/design-system and shared packages.

Changes to existing files:
- pnpm-workspace.yaml: add @astrojs/sitemap, @astrojs/vercel, lucide-vue-next
- eslint.config.ts: add hub ignores and i18n/import rule overrides
- .oxlintrc.json: add hub scripts to ignore patterns
- knip.config.ts: add hub workspace config

apps/hub adaptations from source:
- Replace local cn() with @comfyorg/tailwind-utils (19 files)
- Integrate @comfyorg/design-system/css/base.css in global.css
- Make TEMPLATES_DIR configurable via HUB_TEMPLATES_DIR env var
- Add HUB_SKIP_SYNC flag for builds without template data
- Remove Vite 8-incompatible rollupOptions.output.manualChunks
- Fix stylelint violations (modern color notation, number precision)
- Gitignore generated content (thumbnails, synced templates, AI cache)
2026-04-06 20:53:13 +09:00

1.7 KiB

ACE-Step

ACE-Step is a foundation model for music generation developed by ACE Studio and StepFun. It uses diffusion-based generation with a Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer to achieve state-of-the-art speed and musical coherence.

Model Variants

ACE-Step (3.5B)

  • 3.5B parameter diffusion model
  • DCAE encoder with linear transformer conditioning
  • 27 or 60 inference steps recommended
  • Apache 2.0 license

Key Features

  • 15x faster than LLM-based baselines (20 seconds for a 4-minute song on A100)
  • Full-song generation with lyrics and structure
  • Duration control for variable-length output
  • Music remixing and style transfer
  • Lyrics editing and vocal synthesis
  • Supports 16+ languages including English, Chinese, Japanese, Korean, French, German, Spanish, and more
  • Text-to-music from natural language descriptions

Hardware Requirements

  • RTX 3090: 12.76x real-time factor at 27 steps
  • RTX 4090: 34.48x real-time factor at 27 steps
  • NVIDIA A100: 27.27x real-time factor at 27 steps
  • Apple M2 Max: 2.27x real-time factor at 27 steps
  • Higher step counts (60) reduce speed by roughly half

Common Use Cases

  • Original music generation from text descriptions
  • Song remixing and style transfer
  • Lyrics-based music creation
  • Multi-language vocal music generation
  • Rapid music prototyping for content creators
  • Background music and soundtrack generation

Key Parameters

  • steps: Inference steps (27 for speed, 60 for quality)
  • duration: Target audio length in seconds (up to ~5 minutes)
  • lyrics: Song lyrics text input for vocal generation
  • prompt: Natural language description of desired music style and mood
  • seed: Random seed for reproducible generation (results are seed-sensitive)