mirror of
https://github.com/kvcache-ai/sglang.git
synced 2026-07-01 04:08:10 +00:00
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
1.4 KiB
1.4 KiB
Performance
This section covers the main performance levers for SGLang Diffusion: attention backends, caching acceleration, and profiling.
Overview
| Optimization | Type | Description |
|---|---|---|
| Cache-DiT | Caching | Block-level caching with DBCache, TaylorSeer, and SCM |
| TeaCache | Caching | Timestep-level caching based on temporal similarity |
| Attention Backends | Kernel | Optimized attention implementations (FlashAttention, SageAttention, etc.) |
| Profiling | Diagnostics | PyTorch Profiler and Nsight Systems guidance |
Start Here
- Use Attention Backends to choose the best backend for your model and hardware.
- Use Caching Acceleration to reduce denoising cost with Cache-DiT or TeaCache.
- Use Profiling when you need to diagnose a bottleneck rather than guess.
Caching at a Glance
- Cache-DiT is block-level caching for diffusers pipelines and higher speedup-oriented tuning.
- TeaCache is timestep-level caching built into SGLang model families.
:maxdepth: 1
attention_backends
cache/index
profiling
Current Baseline Snapshot
For Ring SP benchmark details, see: