Files
sglang/docs/diffusion/environment_variables.md
2026-04-17 16:23:46 +08:00

6.8 KiB

Environment Variables

Runtime

Environment Variable Default Description
SGLANG_DIFFUSION_TARGET_DEVICE cuda Target device for inference (cuda, rocm, xpu, npu, musa, mps, cpu)
SGLANG_DIFFUSION_ATTENTION_BACKEND not set Override attention backend via env var (e.g. fa, torch_sdpa, sage_attn)
SGLANG_DIFFUSION_ATTENTION_CONFIG not set Path to attention backend configuration file (JSON/YAML)
SGLANG_DIFFUSION_STAGE_LOGGING false Enable per-stage timing logs
SGLANG_DIFFUSION_SERVER_DEV_MODE false Enable dev-only HTTP endpoints for debugging
SGLANG_DIFFUSION_TORCH_PROFILER_DIR not set Directory for torch profiler traces (absolute path). Enables profiling when set
SGLANG_DIFFUSION_CACHE_ROOT ~/.cache/sgl_diffusion Root directory for cache files
SGLANG_DIFFUSION_CONFIG_ROOT ~/.config/sgl_diffusion Root directory for configuration files
SGLANG_DIFFUSION_LOGGING_LEVEL INFO Default logging level
SGLANG_DIFFUSION_WORKER_MULTIPROC_METHOD fork Multiprocess context for workers (fork or spawn)
SGLANG_USE_RUNAI_MODEL_STREAMER true Use Run:AI model streamer for model loading

Platform-Specific

Apple MPS

Environment Variable Default Description
SGLANG_USE_MLX not set Set to 1 to enable MLX fused Metal kernels for norm ops on MPS

ROCm (AMD GPUs)

Environment Variable Default Description
SGLANG_USE_ROCM_VAE false Use AITer GroupNorm in VAE for improved performance on ROCm
SGLANG_USE_ROCM_CUDNN_BENCHMARK false Enable MIOpen auto-tuning for VAE conv layers on ROCm

Quantization

Environment Variable Default Description
SGLANG_DIFFUSION_FLASHINFER_FP4_GEMM_BACKEND not set FlashInfer FP4 GEMM backend for generic NVFP4 fallback

Caching Acceleration

These variables configure caching acceleration for Diffusion Transformer (DiT) models. SGLang supports multiple caching strategies - see caching documentation for an overview.

Cache-DiT Configuration

See cache-dit documentation for detailed configuration.

Environment Variable Default Description
SGLANG_CACHE_DIT_ENABLED false Enable Cache-DiT acceleration
SGLANG_CACHE_DIT_FN 1 First N blocks to always compute
SGLANG_CACHE_DIT_BN 0 Last N blocks to always compute
SGLANG_CACHE_DIT_WARMUP 4 Warmup steps before caching
SGLANG_CACHE_DIT_RDT 0.24 Residual difference threshold
SGLANG_CACHE_DIT_MC 3 Max continuous cached steps
SGLANG_CACHE_DIT_TAYLORSEER false Enable TaylorSeer calibrator
SGLANG_CACHE_DIT_TS_ORDER 1 TaylorSeer order (1 or 2)
SGLANG_CACHE_DIT_SCM_PRESET none SCM preset (none/slow/medium/fast/ultra)
SGLANG_CACHE_DIT_SCM_POLICY dynamic SCM caching policy
SGLANG_CACHE_DIT_SCM_COMPUTE_BINS not set Custom SCM compute bins
SGLANG_CACHE_DIT_SCM_CACHE_BINS not set Custom SCM cache bins

Cache-DiT Secondary Transformer

For dual-transformer models (e.g., Wan2.2 with high/low-noise experts), these variables configure caching for the secondary transformer. Each falls back to its primary counterpart if not set.

Environment Variable Default Description
SGLANG_CACHE_DIT_SECONDARY_FN (from primary) First N blocks to always compute
SGLANG_CACHE_DIT_SECONDARY_BN (from primary) Last N blocks to always compute
SGLANG_CACHE_DIT_SECONDARY_WARMUP (from primary) Warmup steps before caching
SGLANG_CACHE_DIT_SECONDARY_RDT (from primary) Residual difference threshold
SGLANG_CACHE_DIT_SECONDARY_MC (from primary) Max continuous cached steps
SGLANG_CACHE_DIT_SECONDARY_TAYLORSEER (from primary) Enable TaylorSeer calibrator
SGLANG_CACHE_DIT_SECONDARY_TS_ORDER (from primary) TaylorSeer order (1 or 2)

Cloud Storage

These variables configure S3-compatible cloud storage for automatically uploading generated images and videos.

Environment Variable Default Description
SGLANG_CLOUD_STORAGE_TYPE not set Set to s3 to enable cloud storage
SGLANG_S3_BUCKET_NAME not set The name of the S3 bucket
SGLANG_S3_ENDPOINT_URL not set Custom endpoint URL (for MinIO, OSS, etc.)
SGLANG_S3_REGION_NAME us-east-1 AWS region name
SGLANG_S3_ACCESS_KEY_ID not set AWS Access Key ID
SGLANG_S3_SECRET_ACCESS_KEY not set AWS Secret Access Key

CUDA Crash Debugging

These variables enable kernel API logging and optional input/output dumps around diffusion CUDA kernel call boundaries. They are useful when tracking down CUDA crashes such as illegal memory access, device-side assert, or shape mismatches in custom kernels.

Environment Variable Default Description
SGLANG_KERNEL_API_LOGLEVEL 0 Controls crash-debug kernel API logging. 1 logs API names, 3 logs tensor metadata, 5 adds tensor statistics, and 10 also writes dump snapshots.
SGLANG_KERNEL_API_LOGDEST stdout Destination for crash-debug kernel API logs. Use stdout, stderr, or a file path. %i is replaced with the process PID.
SGLANG_KERNEL_API_DUMP_DIR sglang_kernel_api_dumps Output directory for level-10 kernel API dumps. %i is replaced with the process PID.
SGLANG_KERNEL_API_DUMP_INCLUDE not set Comma-separated wildcard patterns for kernel API names to include in level-10 dumps.
SGLANG_KERNEL_API_DUMP_EXCLUDE not set Comma-separated wildcard patterns for kernel API names to exclude from level-10 dumps.