sglang/docs/diffusion/index.md

# SGLang Diffusion

SGLang Diffusion is a high-performance inference framework for image and video generation. It provides native SGLang pipelines, diffusers backend support, an OpenAI-compatible server, and an optimized kernel stack built on both precompiled `sgl-kernel` operators and JIT kernels for key inference paths.

## Key Features

- Broad model support across Wan, Hunyuan, Qwen-Image, FLUX, Z-Image, GLM-Image, and more
- Fast inference with `sgl-kernel`, JIT kernels, scheduler improvements, and caching acceleration
- Multiple interfaces: `sglang generate`, `sglang serve`, and an OpenAI-compatible API
- Multi-platform support for NVIDIA, AMD, Intel XPU, Ascend, Apple Silicon, and Moore Threads

## Quick Start

```bash
uv pip install "sglang[diffusion]" --prerelease=allow
```

```bash
sglang generate --model-path Qwen/Qwen-Image \
  --prompt "A beautiful sunset over the mountains" \
  --save-output
```

```bash
sglang serve --model-path Qwen/Qwen-Image --port 30010
```

## Start Here

- [Installation](installation.md): install SGLang Diffusion and platform dependencies
- [Compatibility Matrix](compatibility_matrix.md): check model, optimization, and component override support
- [CLI](api/cli.md): run one-off generation jobs or launch a persistent server
- [OpenAI-Compatible API](api/openai_api.md): send image and video requests to the HTTP server
- [Attention Backends](performance/attention_backends.md): choose the best backend for your model and hardware
- [Caching Acceleration](performance/cache/index.md): use Cache-DiT or TeaCache to reduce denoising cost
- [Quantization](quantization.md): load quantized transformer checkpoints
- [Contributing](contributing.md): contribution workflow, adding new models, and CI perf baselines

## Additional Documentation

- [Post-Processing](api/post_processing.md): frame interpolation and upscaling
- [Performance Overview](performance/index.md): overview of attention, caching, and profiling
- [Environment Variables](environment_variables.md): platform, caching, storage, and debugging configuration
- [Support New Models](support_new_models.md): implementation guide for new diffusion pipelines
- [CI Performance](ci_perf.md): performance baseline generation

## References

- [SGLang GitHub](https://github.com/sgl-project/sglang)
- [Cache-DiT](https://github.com/vipshop/cache-dit)
- [FastVideo](https://github.com/hao-ai-lab/FastVideo)
- [xDiT](https://github.com/xdit-project/xDiT)
- [Diffusers](https://github.com/huggingface/diffusers)