mirror of
https://github.com/kvcache-ai/sglang.git
synced 2026-06-30 19:57:52 +00:00
54 lines
2.5 KiB
Markdown
54 lines
2.5 KiB
Markdown
# SGLang Diffusion
|
|
|
|
SGLang Diffusion is a high-performance inference framework for image and video generation. It provides native SGLang pipelines, diffusers backend support, an OpenAI-compatible server, and an optimized kernel stack built on both precompiled `sgl-kernel` operators and JIT kernels for key inference paths.
|
|
|
|
## Key Features
|
|
|
|
- Broad model support across Wan, Hunyuan, Qwen-Image, FLUX, Z-Image, GLM-Image, and more
|
|
- Fast inference with `sgl-kernel`, JIT kernels, scheduler improvements, and caching acceleration
|
|
- Multiple interfaces: `sglang generate`, `sglang serve`, and an OpenAI-compatible API
|
|
- Multi-platform support for NVIDIA, AMD, Intel XPU, Ascend, Apple Silicon, and Moore Threads
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
uv pip install "sglang[diffusion]" --prerelease=allow
|
|
```
|
|
|
|
```bash
|
|
sglang generate --model-path Qwen/Qwen-Image \
|
|
--prompt "A beautiful sunset over the mountains" \
|
|
--save-output
|
|
```
|
|
|
|
```bash
|
|
sglang serve --model-path Qwen/Qwen-Image --port 30010
|
|
```
|
|
|
|
## Start Here
|
|
|
|
- [Installation](installation.md): install SGLang Diffusion and platform dependencies
|
|
- [Compatibility Matrix](compatibility_matrix.md): check model, optimization, and component override support
|
|
- [CLI](api/cli.md): run one-off generation jobs or launch a persistent server
|
|
- [OpenAI-Compatible API](api/openai_api.md): send image and video requests to the HTTP server
|
|
- [Attention Backends](performance/attention_backends.md): choose the best backend for your model and hardware
|
|
- [Caching Acceleration](performance/cache/index.md): use Cache-DiT or TeaCache to reduce denoising cost
|
|
- [Quantization](quantization.md): load quantized transformer checkpoints
|
|
- [Contributing](contributing.md): contribution workflow, adding new models, and CI perf baselines
|
|
|
|
## Additional Documentation
|
|
|
|
- [Post-Processing](api/post_processing.md): frame interpolation and upscaling
|
|
- [Performance Overview](performance/index.md): overview of attention, caching, and profiling
|
|
- [Environment Variables](environment_variables.md): platform, caching, storage, and debugging configuration
|
|
- [Support New Models](support_new_models.md): implementation guide for new diffusion pipelines
|
|
- [CI Performance](ci_perf.md): performance baseline generation
|
|
|
|
## References
|
|
|
|
- [SGLang GitHub](https://github.com/sgl-project/sglang)
|
|
- [Cache-DiT](https://github.com/vipshop/cache-dit)
|
|
- [FastVideo](https://github.com/hao-ai-lab/FastVideo)
|
|
- [xDiT](https://github.com/xdit-project/xDiT)
|
|
- [Diffusers](https://github.com/huggingface/diffusers)
|