mirror of https://github.com/Comfy-Org/ComfyUI_frontend.git synced 2026-04-23 15:59:47 +00:00

Files

dante01yoon bbd0a6b201 feat: migrate workflow template site as apps/hub

Migrate workflow_templates/site into the frontend monorepo as apps/hub
so the hub can use @comfyorg/design-system and shared packages.

Changes to existing files:
- pnpm-workspace.yaml: add @astrojs/sitemap, @astrojs/vercel, lucide-vue-next
- eslint.config.ts: add hub ignores and i18n/import rule overrides
- .oxlintrc.json: add hub scripts to ignore patterns
- knip.config.ts: add hub workspace config

apps/hub adaptations from source:
- Replace local cn() with @comfyorg/tailwind-utils (19 files)
- Integrate @comfyorg/design-system/css/base.css in global.css
- Make TEMPLATES_DIR configurable via HUB_TEMPLATES_DIR env var
- Add HUB_SKIP_SYNC flag for builds without template data
- Remove Vite 8-incompatible rollupOptions.output.manualChunks
- Fix stylelint violations (modern color notation, number precision)
- Gitignore generated content (thumbnails, synced templates, AI cache)

2026-04-06 20:53:13 +09:00

2.9 KiB

Raw Blame History

Qwen

Qwen is Alibaba's family of vision-language and image generation models, spanning visual understanding, image editing, and image generation.

Model Variants

Qwen2.5-VL

Multimodal vision-language model from the Qwen team
Available in 3B, 7B, and 72B parameter sizes
Image understanding, video comprehension (1+ hour videos), and visual localization
Visual agent capabilities: computer use, phone use, dynamic tool calling
Structured output generation for invoices, forms, and tables
Dynamic resolution and frame rate training for video understanding
Optimized ViT encoder with window attention, SwiGLU, and RMSNorm

Qwen-Image-Edit

Specialized image editing model with instruction-following
Supports inpainting, outpainting, style transfer, and content-aware edits
11 workflow templates available

Qwen-Image

Text-to-image generation model from the Qwen family
7 workflow templates available

Qwen-Image-Layered

Layered image generation for composable outputs
Generates images with separate foreground/background layers
2 workflow templates available

Qwen-Image 2512

Specific variant optimized for particular generation tasks
1 workflow template available

Key Features

Strong visual understanding with state-of-the-art benchmark results
Native multi-language support including Chinese and English
Visual agent capabilities for computer and phone interaction
Video event capture with temporal segment pinpointing
Bounding box and point-based visual localization
Structured JSON output for document and table extraction
Instruction-based image editing with precise control

Hardware Requirements

3B model: 6-8GB VRAM
7B model: 16GB VRAM, flash_attention_2 recommended for multi-image/video
72B model: Multi-GPU setup required (80GB+ per GPU)
Context length: 32,768 tokens default, extendable to 64K+ with YaRN
Dynamic pixel budget: 256-1280 tokens per image (configurable min/max pixels)

Common Use Cases

Image editing based on text instructions
Visual question answering and image description
Long video comprehension and event extraction
Document OCR and structured data extraction
Visual agent tasks (screen interaction, UI navigation)
Layered image generation for design workflows
Text-to-image generation with strong prompt following

Key Parameters

max_new_tokens: Controls output length for VL model responses
min_pixels / max_pixels: Control image token budget (e.g. 256x28x28 to 1280x28x28)
temperature: Generation diversity for text outputs
resized_height / resized_width: Direct image dimension control (rounded to nearest 28)
fps: Frame rate for video input processing in Qwen2.5-VL

Blog References

Qwen Image Edit 2511 & Qwen Image Layered — Better character consistency, RGBA layer decomposition, built-in LoRA support

2.9 KiB Raw Blame History