mirror of https://github.com/Comfy-Org/ComfyUI_frontend.git synced 2026-04-20 14:30:41 +00:00

Files

dante01yoon bbd0a6b201 feat: migrate workflow template site as apps/hub

Migrate workflow_templates/site into the frontend monorepo as apps/hub
so the hub can use @comfyorg/design-system and shared packages.

Changes to existing files:
- pnpm-workspace.yaml: add @astrojs/sitemap, @astrojs/vercel, lucide-vue-next
- eslint.config.ts: add hub ignores and i18n/import rule overrides
- .oxlintrc.json: add hub scripts to ignore patterns
- knip.config.ts: add hub workspace config

apps/hub adaptations from source:
- Replace local cn() with @comfyorg/tailwind-utils (19 files)
- Integrate @comfyorg/design-system/css/base.css in global.css
- Make TEMPLATES_DIR configurable via HUB_TEMPLATES_DIR env var
- Add HUB_SKIP_SYNC flag for builds without template data
- Remove Vite 8-incompatible rollupOptions.output.manualChunks
- Fix stylelint violations (modern color notation, number precision)
- Gitignore generated content (thumbnails, synced templates, AI cache)

2026-04-06 20:53:13 +09:00

2.4 KiB

Raw Blame History

Stable Diffusion 3.5

Stable Diffusion 3.5 is Stability AI's text-to-image model family based on the Multimodal Diffusion Transformer (MMDiT) architecture with rectified flow matching.

Model Variants

Stable Diffusion 3.5 Large

8.1 billion parameter MMDiT model
Highest quality and prompt adherence in the SD family
1 megapixel native resolution (1024×1024)
28-50 inference steps recommended

Stable Diffusion 3.5 Large Turbo

Distilled version of SD 3.5 Large
4-step inference for fast generation
Guidance scale of 0 (classifier-free guidance disabled)
Comparable quality to full model at fraction of the time

Stable Diffusion 3.5 Medium

2.5 billion parameter MMDiT-X architecture
Designed for consumer hardware (9.9GB VRAM for transformer)
Dual attention blocks in first 12 transformer layers
Multi-resolution generation from 0.25 to 2 megapixels
Skip Layer Guidance recommended for better coherency

Key Features

Three text encoders: CLIP ViT-L, OpenCLIP ViT-bigG (77 tokens each), T5-XXL (256 tokens)
QK-normalization for stable training and easier fine-tuning
Rectified flow matching replaces traditional DDPM/DDIM sampling
Strong text rendering and typography in generated images
Diverse output styles (photography, 3D, painting, line art)
Highly customizable base for fine-tuning and LoRA training
T5-XXL encoder optional (can be removed to save memory with minimal quality loss)
Supports negative prompts for excluding unwanted elements

Hardware Requirements

Large: 24GB+ VRAM recommended (fp16), quantizable to fit smaller GPUs
Large Turbo: 16GB+ VRAM recommended
Medium: 10GB VRAM minimum (excluding text encoders)
NF4 quantization available via bitsandbytes for low-VRAM GPUs
CPU offloading supported via diffusers pipeline

Common Use Cases

Photorealistic image generation
Artistic illustration and concept art
Typography and text-heavy designs
Product visualization
Fine-tuning and LoRA development
ControlNet-guided generation

Key Parameters

steps: 28-50 for Large, 4 for Large Turbo, 20-40 for Medium
guidance_scale: 4.5-7.5 for Large/Medium, 0 for Large Turbo
max_sequence_length: T5 token limit (77 or 256, higher = better prompt understanding)
resolution: 1024×1024 native, flexible aspect ratios around 1MP
negative_prompt: Text describing elements to exclude (not supported by Turbo)

2.4 KiB Raw Blame History Unescape Escape