mirror of
https://github.com/Comfy-Org/ComfyUI_frontend.git
synced 2026-04-20 14:30:41 +00:00
Migrate workflow_templates/site into the frontend monorepo as apps/hub so the hub can use @comfyorg/design-system and shared packages. Changes to existing files: - pnpm-workspace.yaml: add @astrojs/sitemap, @astrojs/vercel, lucide-vue-next - eslint.config.ts: add hub ignores and i18n/import rule overrides - .oxlintrc.json: add hub scripts to ignore patterns - knip.config.ts: add hub workspace config apps/hub adaptations from source: - Replace local cn() with @comfyorg/tailwind-utils (19 files) - Integrate @comfyorg/design-system/css/base.css in global.css - Make TEMPLATES_DIR configurable via HUB_TEMPLATES_DIR env var - Add HUB_SKIP_SYNC flag for builds without template data - Remove Vite 8-incompatible rollupOptions.output.manualChunks - Fix stylelint violations (modern color notation, number precision) - Gitignore generated content (thumbnails, synced templates, AI cache)
2.4 KiB
2.4 KiB
Stable Diffusion 3.5
Stable Diffusion 3.5 is Stability AI's text-to-image model family based on the Multimodal Diffusion Transformer (MMDiT) architecture with rectified flow matching.
Model Variants
Stable Diffusion 3.5 Large
- 8.1 billion parameter MMDiT model
- Highest quality and prompt adherence in the SD family
- 1 megapixel native resolution (1024×1024)
- 28-50 inference steps recommended
Stable Diffusion 3.5 Large Turbo
- Distilled version of SD 3.5 Large
- 4-step inference for fast generation
- Guidance scale of 0 (classifier-free guidance disabled)
- Comparable quality to full model at fraction of the time
Stable Diffusion 3.5 Medium
- 2.5 billion parameter MMDiT-X architecture
- Designed for consumer hardware (9.9GB VRAM for transformer)
- Dual attention blocks in first 12 transformer layers
- Multi-resolution generation from 0.25 to 2 megapixels
- Skip Layer Guidance recommended for better coherency
Key Features
- Three text encoders: CLIP ViT-L, OpenCLIP ViT-bigG (77 tokens each), T5-XXL (256 tokens)
- QK-normalization for stable training and easier fine-tuning
- Rectified flow matching replaces traditional DDPM/DDIM sampling
- Strong text rendering and typography in generated images
- Diverse output styles (photography, 3D, painting, line art)
- Highly customizable base for fine-tuning and LoRA training
- T5-XXL encoder optional (can be removed to save memory with minimal quality loss)
- Supports negative prompts for excluding unwanted elements
Hardware Requirements
- Large: 24GB+ VRAM recommended (fp16), quantizable to fit smaller GPUs
- Large Turbo: 16GB+ VRAM recommended
- Medium: 10GB VRAM minimum (excluding text encoders)
- NF4 quantization available via bitsandbytes for low-VRAM GPUs
- CPU offloading supported via diffusers pipeline
Common Use Cases
- Photorealistic image generation
- Artistic illustration and concept art
- Typography and text-heavy designs
- Product visualization
- Fine-tuning and LoRA development
- ControlNet-guided generation
Key Parameters
- steps: 28-50 for Large, 4 for Large Turbo, 20-40 for Medium
- guidance_scale: 4.5-7.5 for Large/Medium, 0 for Large Turbo
- max_sequence_length: T5 token limit (77 or 256, higher = better prompt understanding)
- resolution: 1024×1024 native, flexible aspect ratios around 1MP
- negative_prompt: Text describing elements to exclude (not supported by Turbo)