sglang/docs/diffusion/environment_variables.md

# Environment Variables

## Runtime

| Environment Variable | Default | Description |
|----------------------|---------|-------------|
| `SGLANG_DIFFUSION_TARGET_DEVICE` | `cuda` | Target device for inference (`cuda`, `rocm`, `xpu`, `npu`, `musa`, `mps`, `cpu`) |
| `SGLANG_DIFFUSION_ATTENTION_BACKEND` | not set | Override attention backend via env var (e.g. `fa`, `torch_sdpa`, `sage_attn`) |
| `SGLANG_DIFFUSION_ATTENTION_CONFIG` | not set | Path to attention backend configuration file (JSON/YAML) |
| `SGLANG_DIFFUSION_STAGE_LOGGING` | false | Enable per-stage timing logs |
| `SGLANG_DIFFUSION_SERVER_DEV_MODE` | false | Enable dev-only HTTP endpoints for debugging |
| `SGLANG_DIFFUSION_TORCH_PROFILER_DIR` | not set | Directory for torch profiler traces (absolute path). Enables profiling when set |
| `SGLANG_DIFFUSION_CACHE_ROOT` | `~/.cache/sgl_diffusion` | Root directory for cache files |
| `SGLANG_DIFFUSION_CONFIG_ROOT` | `~/.config/sgl_diffusion` | Root directory for configuration files |
| `SGLANG_DIFFUSION_LOGGING_LEVEL` | `INFO` | Default logging level |
| `SGLANG_DIFFUSION_WORKER_MULTIPROC_METHOD` | `fork` | Multiprocess context for workers (`fork` or `spawn`) |
| `SGLANG_USE_RUNAI_MODEL_STREAMER` | true | Use Run:AI model streamer for model loading |

## Platform-Specific

### Apple MPS

| Environment Variable | Default | Description                                                  |
|----------------------|---------|--------------------------------------------------------------|
| `SGLANG_USE_MLX`     | not set | Set to `1` to enable MLX fused Metal kernels for norm ops on MPS |

### ROCm (AMD GPUs)

| Environment Variable | Default | Description |
|----------------------|---------|-------------|
| `SGLANG_USE_ROCM_VAE` | false | Use AITer GroupNorm in VAE for improved performance on ROCm |
| `SGLANG_USE_ROCM_CUDNN_BENCHMARK` | false | Enable MIOpen auto-tuning for VAE conv layers on ROCm |

### Quantization

| Environment Variable | Default | Description |
|----------------------|---------|-------------|
| `SGLANG_DIFFUSION_FLASHINFER_FP4_GEMM_BACKEND` | not set | FlashInfer FP4 GEMM backend for generic NVFP4 fallback |

## Caching Acceleration

These variables configure caching acceleration for Diffusion Transformer (DiT) models.
SGLang supports multiple caching strategies - see [caching documentation](performance/cache/index.md) for an overview.

### Cache-DiT Configuration

See [cache-dit documentation](performance/cache/cache_dit.md) for detailed configuration.

| Environment Variable                | Default | Description                              |
|-------------------------------------|---------|------------------------------------------|
| `SGLANG_CACHE_DIT_ENABLED`          | false   | Enable Cache-DiT acceleration            |
| `SGLANG_CACHE_DIT_FN`               | 1       | First N blocks to always compute         |
| `SGLANG_CACHE_DIT_BN`               | 0       | Last N blocks to always compute          |
| `SGLANG_CACHE_DIT_WARMUP`           | 4       | Warmup steps before caching              |
| `SGLANG_CACHE_DIT_RDT`              | 0.24    | Residual difference threshold            |
| `SGLANG_CACHE_DIT_MC`               | 3       | Max continuous cached steps              |
| `SGLANG_CACHE_DIT_TAYLORSEER`       | false   | Enable TaylorSeer calibrator             |
| `SGLANG_CACHE_DIT_TS_ORDER`         | 1       | TaylorSeer order (1 or 2)                |
| `SGLANG_CACHE_DIT_SCM_PRESET`       | none    | SCM preset (none/slow/medium/fast/ultra) |
| `SGLANG_CACHE_DIT_SCM_POLICY`       | dynamic | SCM caching policy                       |
| `SGLANG_CACHE_DIT_SCM_COMPUTE_BINS` | not set | Custom SCM compute bins                  |
| `SGLANG_CACHE_DIT_SCM_CACHE_BINS`   | not set | Custom SCM cache bins                    |

### Cache-DiT Secondary Transformer

For dual-transformer models (e.g., Wan2.2 with high/low-noise experts), these variables configure caching for the secondary transformer. Each falls back to its primary counterpart if not set.

| Environment Variable | Default | Description |
|-------------------------------------|---------|------------------------------------------|
| `SGLANG_CACHE_DIT_SECONDARY_FN` | (from primary) | First N blocks to always compute |
| `SGLANG_CACHE_DIT_SECONDARY_BN` | (from primary) | Last N blocks to always compute |
| `SGLANG_CACHE_DIT_SECONDARY_WARMUP` | (from primary) | Warmup steps before caching |
| `SGLANG_CACHE_DIT_SECONDARY_RDT` | (from primary) | Residual difference threshold |
| `SGLANG_CACHE_DIT_SECONDARY_MC` | (from primary) | Max continuous cached steps |
| `SGLANG_CACHE_DIT_SECONDARY_TAYLORSEER` | (from primary) | Enable TaylorSeer calibrator |
| `SGLANG_CACHE_DIT_SECONDARY_TS_ORDER` | (from primary) | TaylorSeer order (1 or 2) |

## Cloud Storage

These variables configure S3-compatible cloud storage for automatically uploading generated images and videos.

| Environment Variable            | Default | Description                                            |
|---------------------------------|---------|--------------------------------------------------------|
| `SGLANG_CLOUD_STORAGE_TYPE`     | not set | Set to `s3` to enable cloud storage                    |
| `SGLANG_S3_BUCKET_NAME`         | not set | The name of the S3 bucket                              |
| `SGLANG_S3_ENDPOINT_URL`        | not set | Custom endpoint URL (for MinIO, OSS, etc.)             |
| `SGLANG_S3_REGION_NAME`         | us-east-1 | AWS region name                                      |
| `SGLANG_S3_ACCESS_KEY_ID`       | not set | AWS Access Key ID                                      |
| `SGLANG_S3_SECRET_ACCESS_KEY`   | not set | AWS Secret Access Key                                  |

## CUDA Crash Debugging

These variables enable kernel API logging and optional input/output dumps around diffusion CUDA kernel call boundaries. They are useful when tracking down CUDA crashes such as illegal memory access, device-side assert, or shape mismatches in custom kernels.

| Environment Variable | Default | Description |
|----------------------|---------|-------------|
| `SGLANG_KERNEL_API_LOGLEVEL` | `0` | Controls crash-debug kernel API logging. `1` logs API names, `3` logs tensor metadata, `5` adds tensor statistics, and `10` also writes dump snapshots. |
| `SGLANG_KERNEL_API_LOGDEST` | `stdout` | Destination for crash-debug kernel API logs. Use `stdout`, `stderr`, or a file path. `%i` is replaced with the process PID. |
| `SGLANG_KERNEL_API_DUMP_DIR` | `sglang_kernel_api_dumps` | Output directory for level-10 kernel API dumps. `%i` is replaced with the process PID. |
| `SGLANG_KERNEL_API_DUMP_INCLUDE` | not set | Comma-separated wildcard patterns for kernel API names to include in level-10 dumps. |
| `SGLANG_KERNEL_API_DUMP_EXCLUDE` | not set | Comma-separated wildcard patterns for kernel API names to exclude from level-10 dumps. |