Files
sglang/docs/diffusion/performance/index.md
yuefeng Wu a20d12ae96 [diffusion][doc]: add ring sp performance benchmark page (#20998)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-30 20:26:05 +03:00

1.4 KiB

Performance

This section covers the main performance levers for SGLang Diffusion: attention backends, caching acceleration, and profiling.

Overview

Optimization Type Description
Cache-DiT Caching Block-level caching with DBCache, TaylorSeer, and SCM
TeaCache Caching Timestep-level caching based on temporal similarity
Attention Backends Kernel Optimized attention implementations (FlashAttention, SageAttention, etc.)
Profiling Diagnostics PyTorch Profiler and Nsight Systems guidance

Start Here

Caching at a Glance

  • Cache-DiT is block-level caching for diffusers pipelines and higher speedup-oriented tuning.
  • TeaCache is timestep-level caching built into SGLang model families.
:maxdepth: 1

attention_backends
cache/index
profiling

Current Baseline Snapshot

For Ring SP benchmark details, see:

References