mirror of
https://github.com/Comfy-Org/ComfyUI_frontend.git
synced 2026-04-20 14:30:41 +00:00
Migrate workflow_templates/site into the frontend monorepo as apps/hub so the hub can use @comfyorg/design-system and shared packages. Changes to existing files: - pnpm-workspace.yaml: add @astrojs/sitemap, @astrojs/vercel, lucide-vue-next - eslint.config.ts: add hub ignores and i18n/import rule overrides - .oxlintrc.json: add hub scripts to ignore patterns - knip.config.ts: add hub workspace config apps/hub adaptations from source: - Replace local cn() with @comfyorg/tailwind-utils (19 files) - Integrate @comfyorg/design-system/css/base.css in global.css - Make TEMPLATES_DIR configurable via HUB_TEMPLATES_DIR env var - Add HUB_SKIP_SYNC flag for builds without template data - Remove Vite 8-incompatible rollupOptions.output.manualChunks - Fix stylelint violations (modern color notation, number precision) - Gitignore generated content (thumbnails, synced templates, AI cache)
4.0 KiB
4.0 KiB
Wan
Wan is a family of open-source video generation models from Alibaba's Tongyi Lab, spanning text-to-video, image-to-video, speech-to-video, motion control, and video editing. All models are released under the Apache 2.0 license.
Model Variants
Wan 2.1 T2V / I2V
- Text-to-video and image-to-video generation
- Available in 1.3B and 14B parameter sizes
- Supports 480p and 720p output, variable aspect ratios
- Chinese and English visual text generation
Wan 2.1 Fun (Control / InPaint / Camera)
- Camera control with predefined or custom camera movements
- Video inpainting for targeted frame-level editing
- Depth, pose, and canny edge control for guided generation
Wan 2.1 VACE (Video Any-Condition Editing)
- All-in-one model for video creation and editing (ICCV 2025)
- Reference-to-video (R2V), video-to-video (V2V), and masked editing (MV2V)
- Supports inpainting, outpainting, first-last-frame interpolation, and animate-anything
- Available in 1.3B and 14B sizes, built on Wan 2.1 base models
Wan 2.2 T2V / I2V / TI2V
- Mixture-of-Experts (MoE) architecture with high-noise and low-noise expert models
- T2V-A14B and I2V-A14B (14B MoE), TI2V-5B (hybrid text+image-to-video)
- Cinematic-level aesthetic control with lighting, composition, and color tone guidance
- TI2V-5B uses a high-compression 16×16×4 VAE, runs on consumer GPUs like 4090
Wan 2.2 S2V (Speech-to-Video)
- Audio-driven cinematic video generation from image + speech + text
- Supports lip-sync, facial expressions, and pose-driven generation
- Generates variable-length videos matching input audio duration
Wan 2.2 Animate
- Character animation and subject replacement from video + reference image
- Animate mode: transfers motion from reference video onto a still character
- Replace mode: swaps subjects while preserving background, lighting, and camera motion
- Includes relighting LoRA for scene-matched lighting adaptation
Wan Move
- Point-level motion control for image-to-video generation (NeurIPS 2025)
- Dense trajectory-based guidance for fine-grained object motion
- Latent trajectory propagation without extra motion modules
- 14B model generating 5-second 480p videos
Key Features
- High temporal consistency and natural physics simulation
- Multiple aspect ratios (16:9, 9:16, 1:1) at 24fps
- MoE architecture in 2.2 for higher quality at same compute cost
- Bilingual prompt support (Chinese and English)
- ComfyUI and Diffusers integration across all variants
Hardware Requirements
- 1.3B models: 8GB VRAM minimum
- 14B models: 24GB+ VRAM recommended (80GB for full precision)
- TI2V-5B: runs on consumer 4090 GPUs at 720p
- FP8 quantization available for lower VRAM configurations
- Multi-GPU inference supported via FSDP + DeepSpeed Ulysses
Common Use Cases
- Social media and short-form video content
- Character animation and motion transfer
- Video inpainting and scene editing
- Product animation and marketing videos
- Speech-driven talking head generation
- Storyboard-to-video conversion
Key Parameters
- frames: Number of output frames (typically 81 for ~3.4s at 24fps)
- steps: Inference steps (20-50 recommended)
- cfg_scale: Guidance scale for prompt adherence (3-7 typical)
- size: Output resolution (480p or 720p)
- model_name: Selects variant (e.g., vace-14B, ti2v-5B, s2v-14B)
Blog References
- Wan 2.1 Video Model Native Support — Initial release with 4 model variants, 8.19GB VRAM minimum
- Wan 2.1 VACE Native Support — Unified video editing: Move/Swap/Reference/Expand/Animate Anything
- Wan 2.2 Day-0 Support — MoE architecture, Apache 2.0 license, cinematic controls
- WAN 2.6 Reference-to-Video — Generate videos from reference clips at up to 1080p
- The Complete AI Upscaling Handbook — Wan 2.2 used for creative video upscaling