Files
ComfyUI_frontend/apps/hub/knowledge/models/stable-audio.md
dante01yoon bbd0a6b201 feat: migrate workflow template site as apps/hub
Migrate workflow_templates/site into the frontend monorepo as apps/hub
so the hub can use @comfyorg/design-system and shared packages.

Changes to existing files:
- pnpm-workspace.yaml: add @astrojs/sitemap, @astrojs/vercel, lucide-vue-next
- eslint.config.ts: add hub ignores and i18n/import rule overrides
- .oxlintrc.json: add hub scripts to ignore patterns
- knip.config.ts: add hub workspace config

apps/hub adaptations from source:
- Replace local cn() with @comfyorg/tailwind-utils (19 files)
- Integrate @comfyorg/design-system/css/base.css in global.css
- Make TEMPLATES_DIR configurable via HUB_TEMPLATES_DIR env var
- Add HUB_SKIP_SYNC flag for builds without template data
- Remove Vite 8-incompatible rollupOptions.output.manualChunks
- Fix stylelint violations (modern color notation, number precision)
- Gitignore generated content (thumbnails, synced templates, AI cache)
2026-04-06 20:53:13 +09:00

1.9 KiB

Stable Audio Open

Stable Audio Open 1.0 is Stability AI's open-source text-to-audio model for generating sound effects, production elements, and short musical clips.

Model Variants

Stable Audio Open 1.0

  • 1.2B parameter latent diffusion model
  • Transformer-based diffusion (DiT) architecture
  • T5-base text encoder for conditioning
  • Variational autoencoder for audio compression
  • Stability AI Community License (non-commercial)

Stable Audio (Commercial)

  • Full-length music generation up to 3 minutes with audio-to-audio and inpainting
  • Available via Stability AI platform API, commercial license

Key Features

  • Generates up to 47 seconds of stereo audio at 44.1kHz
  • Text-prompted sound effects, drum beats, ambient sounds, and foley
  • Variable-length output with timing control
  • Fine-tunable on custom audio datasets
  • Trained exclusively on Creative Commons licensed audio (CC0, CC BY, CC Sampling+)
  • Strong performance for sound effects and field recordings
  • Compatible with both stable-audio-tools and diffusers libraries

Hardware Requirements

  • Minimum: 8GB VRAM (fp16)
  • Recommended: 12GB+ VRAM for comfortable inference
  • Half-precision (fp16) supported for reduced memory
  • Chunked decoding available for memory-constrained setups
  • Inference speed: 8-20 diffusion steps per second depending on GPU

Common Use Cases

  • Sound effect and foley generation
  • Drum beats and instrument riff creation
  • Ambient soundscapes and background audio
  • Music production elements and samples
  • Audio prototyping for film and game sound design

Key Parameters

  • steps: Number of inference steps (100-200 recommended)
  • cfg_scale: Classifier-free guidance scale (typically 7)
  • seconds_total: Target audio duration (up to 47 seconds)
  • seconds_start: Start time offset for timing control
  • negative_prompt: Text describing undesired audio qualities
  • sampler_type: Diffusion sampler (dpmpp-3m-sde recommended)