Environment Variables

Runtime

Environment Variable	Default	Description
`SGLANG_DIFFUSION_TARGET_DEVICE`	`cuda`	Target device for inference (`cuda`, `rocm`, `xpu`, `npu`, `musa`, `mps`, `cpu`)
`SGLANG_DIFFUSION_ATTENTION_BACKEND`	not set	Override attention backend via env var (e.g. `fa`, `torch_sdpa`, `sage_attn`)
`SGLANG_DIFFUSION_ATTENTION_CONFIG`	not set	Path to attention backend configuration file (JSON/YAML)
`SGLANG_DIFFUSION_STAGE_LOGGING`	false	Enable per-stage timing logs
`SGLANG_DIFFUSION_SERVER_DEV_MODE`	false	Enable dev-only HTTP endpoints for debugging
`SGLANG_DIFFUSION_TORCH_PROFILER_DIR`	not set	Directory for torch profiler traces (absolute path). Enables profiling when set
`SGLANG_DIFFUSION_CACHE_ROOT`	`~/.cache/sgl_diffusion`	Root directory for cache files
`SGLANG_DIFFUSION_CONFIG_ROOT`	`~/.config/sgl_diffusion`	Root directory for configuration files
`SGLANG_DIFFUSION_LOGGING_LEVEL`	`INFO`	Default logging level
`SGLANG_DIFFUSION_WORKER_MULTIPROC_METHOD`	`fork`	Multiprocess context for workers (`fork` or `spawn`)
`SGLANG_USE_RUNAI_MODEL_STREAMER`	true	Use Run:AI model streamer for model loading

Platform-Specific

Apple MPS

Environment Variable	Default	Description
`SGLANG_USE_MLX`	not set	Set to `1` to enable MLX fused Metal kernels for norm ops on MPS

ROCm (AMD GPUs)

Environment Variable	Default	Description
`SGLANG_USE_ROCM_VAE`	false	Use AITer GroupNorm in VAE for improved performance on ROCm
`SGLANG_USE_ROCM_CUDNN_BENCHMARK`	false	Enable MIOpen auto-tuning for VAE conv layers on ROCm

Quantization

Environment Variable	Default	Description
`SGLANG_DIFFUSION_FLASHINFER_FP4_GEMM_BACKEND`	not set	FlashInfer FP4 GEMM backend for generic NVFP4 fallback

Caching Acceleration

These variables configure caching acceleration for Diffusion Transformer (DiT) models. SGLang supports multiple caching strategies - see caching documentation for an overview.

Cache-DiT Configuration

See cache-dit documentation for detailed configuration.

Environment Variable	Default	Description
`SGLANG_CACHE_DIT_ENABLED`	false	Enable Cache-DiT acceleration
`SGLANG_CACHE_DIT_FN`	1	First N blocks to always compute
`SGLANG_CACHE_DIT_BN`	0	Last N blocks to always compute
`SGLANG_CACHE_DIT_WARMUP`	4	Warmup steps before caching
`SGLANG_CACHE_DIT_RDT`	0.24	Residual difference threshold
`SGLANG_CACHE_DIT_MC`	3	Max continuous cached steps
`SGLANG_CACHE_DIT_TAYLORSEER`	false	Enable TaylorSeer calibrator
`SGLANG_CACHE_DIT_TS_ORDER`	1	TaylorSeer order (1 or 2)
`SGLANG_CACHE_DIT_SCM_PRESET`	none	SCM preset (none/slow/medium/fast/ultra)
`SGLANG_CACHE_DIT_SCM_POLICY`	dynamic	SCM caching policy
`SGLANG_CACHE_DIT_SCM_COMPUTE_BINS`	not set	Custom SCM compute bins
`SGLANG_CACHE_DIT_SCM_CACHE_BINS`	not set	Custom SCM cache bins

Cache-DiT Secondary Transformer

For dual-transformer models (e.g., Wan2.2 with high/low-noise experts), these variables configure caching for the secondary transformer. Each falls back to its primary counterpart if not set.

Environment Variable	Default	Description
`SGLANG_CACHE_DIT_SECONDARY_FN`	(from primary)	First N blocks to always compute
`SGLANG_CACHE_DIT_SECONDARY_BN`	(from primary)	Last N blocks to always compute
`SGLANG_CACHE_DIT_SECONDARY_WARMUP`	(from primary)	Warmup steps before caching
`SGLANG_CACHE_DIT_SECONDARY_RDT`	(from primary)	Residual difference threshold
`SGLANG_CACHE_DIT_SECONDARY_MC`	(from primary)	Max continuous cached steps
`SGLANG_CACHE_DIT_SECONDARY_TAYLORSEER`	(from primary)	Enable TaylorSeer calibrator
`SGLANG_CACHE_DIT_SECONDARY_TS_ORDER`	(from primary)	TaylorSeer order (1 or 2)

Cloud Storage

These variables configure S3-compatible cloud storage for automatically uploading generated images and videos.

Environment Variable	Default	Description
`SGLANG_CLOUD_STORAGE_TYPE`	not set	Set to `s3` to enable cloud storage
`SGLANG_S3_BUCKET_NAME`	not set	The name of the S3 bucket
`SGLANG_S3_ENDPOINT_URL`	not set	Custom endpoint URL (for MinIO, OSS, etc.)
`SGLANG_S3_REGION_NAME`	us-east-1	AWS region name
`SGLANG_S3_ACCESS_KEY_ID`	not set	AWS Access Key ID
`SGLANG_S3_SECRET_ACCESS_KEY`	not set	AWS Secret Access Key

CUDA Crash Debugging

These variables enable kernel API logging and optional input/output dumps around diffusion CUDA kernel call boundaries. They are useful when tracking down CUDA crashes such as illegal memory access, device-side assert, or shape mismatches in custom kernels.

Environment Variable	Default	Description
`SGLANG_KERNEL_API_LOGLEVEL`	`0`	Controls crash-debug kernel API logging. `1` logs API names, `3` logs tensor metadata, `5` adds tensor statistics, and `10` also writes dump snapshots.
`SGLANG_KERNEL_API_LOGDEST`	`stdout`	Destination for crash-debug kernel API logs. Use `stdout`, `stderr`, or a file path. `%i` is replaced with the process PID.
`SGLANG_KERNEL_API_DUMP_DIR`	`sglang_kernel_api_dumps`	Output directory for level-10 kernel API dumps. `%i` is replaced with the process PID.
`SGLANG_KERNEL_API_DUMP_INCLUDE`	not set	Comma-separated wildcard patterns for kernel API names to include in level-10 dumps.
`SGLANG_KERNEL_API_DUMP_EXCLUDE`	not set	Comma-separated wildcard patterns for kernel API names to exclude from level-10 dumps.

6.8 KiB Raw Permalink Blame History