mirror of
https://github.com/Comfy-Org/ComfyUI_frontend.git
synced 2026-04-20 14:30:41 +00:00
Migrate workflow_templates/site into the frontend monorepo as apps/hub so the hub can use @comfyorg/design-system and shared packages. Changes to existing files: - pnpm-workspace.yaml: add @astrojs/sitemap, @astrojs/vercel, lucide-vue-next - eslint.config.ts: add hub ignores and i18n/import rule overrides - .oxlintrc.json: add hub scripts to ignore patterns - knip.config.ts: add hub workspace config apps/hub adaptations from source: - Replace local cn() with @comfyorg/tailwind-utils (19 files) - Integrate @comfyorg/design-system/css/base.css in global.css - Make TEMPLATES_DIR configurable via HUB_TEMPLATES_DIR env var - Add HUB_SKIP_SYNC flag for builds without template data - Remove Vite 8-incompatible rollupOptions.output.manualChunks - Fix stylelint violations (modern color notation, number precision) - Gitignore generated content (thumbnails, synced templates, AI cache)
1.9 KiB
1.9 KiB
Stable Audio Open
Stable Audio Open 1.0 is Stability AI's open-source text-to-audio model for generating sound effects, production elements, and short musical clips.
Model Variants
Stable Audio Open 1.0
- 1.2B parameter latent diffusion model
- Transformer-based diffusion (DiT) architecture
- T5-base text encoder for conditioning
- Variational autoencoder for audio compression
- Stability AI Community License (non-commercial)
Stable Audio (Commercial)
- Full-length music generation up to 3 minutes with audio-to-audio and inpainting
- Available via Stability AI platform API, commercial license
Key Features
- Generates up to 47 seconds of stereo audio at 44.1kHz
- Text-prompted sound effects, drum beats, ambient sounds, and foley
- Variable-length output with timing control
- Fine-tunable on custom audio datasets
- Trained exclusively on Creative Commons licensed audio (CC0, CC BY, CC Sampling+)
- Strong performance for sound effects and field recordings
- Compatible with both stable-audio-tools and diffusers libraries
Hardware Requirements
- Minimum: 8GB VRAM (fp16)
- Recommended: 12GB+ VRAM for comfortable inference
- Half-precision (fp16) supported for reduced memory
- Chunked decoding available for memory-constrained setups
- Inference speed: 8-20 diffusion steps per second depending on GPU
Common Use Cases
- Sound effect and foley generation
- Drum beats and instrument riff creation
- Ambient soundscapes and background audio
- Music production elements and samples
- Audio prototyping for film and game sound design
Key Parameters
- steps: Number of inference steps (100-200 recommended)
- cfg_scale: Classifier-free guidance scale (typically 7)
- seconds_total: Target audio duration (up to 47 seconds)
- seconds_start: Start time offset for timing control
- negative_prompt: Text describing undesired audio qualities
- sampler_type: Diffusion sampler (dpmpp-3m-sde recommended)