mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 04:08:10 +00:00

Files

yuefeng Wu a20d12ae96 [diffusion][doc]: add ring sp performance benchmark page (#20998 )

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

2026-03-30 20:26:05 +03:00

Performance

This section covers the main performance levers for SGLang Diffusion: attention backends, caching acceleration, and profiling.

Overview

Optimization	Type	Description
Cache-DiT	Caching	Block-level caching with DBCache, TaylorSeer, and SCM
TeaCache	Caching	Timestep-level caching based on temporal similarity
Attention Backends	Kernel	Optimized attention implementations (FlashAttention, SageAttention, etc.)
Profiling	Diagnostics	PyTorch Profiler and Nsight Systems guidance

Cache-DiT is block-level caching for diffusers pipelines and higher speedup-oriented tuning.
TeaCache is timestep-level caching built into SGLang model families.

:maxdepth: 1

attention_backends
cache/index
profiling

For Ring SP benchmark details, see: