Files
sglang/docs/diffusion/environment_variables.md
2026-04-17 16:23:46 +08:00

102 lines
6.8 KiB
Markdown

# Environment Variables
## Runtime
| Environment Variable | Default | Description |
|----------------------|---------|-------------|
| `SGLANG_DIFFUSION_TARGET_DEVICE` | `cuda` | Target device for inference (`cuda`, `rocm`, `xpu`, `npu`, `musa`, `mps`, `cpu`) |
| `SGLANG_DIFFUSION_ATTENTION_BACKEND` | not set | Override attention backend via env var (e.g. `fa`, `torch_sdpa`, `sage_attn`) |
| `SGLANG_DIFFUSION_ATTENTION_CONFIG` | not set | Path to attention backend configuration file (JSON/YAML) |
| `SGLANG_DIFFUSION_STAGE_LOGGING` | false | Enable per-stage timing logs |
| `SGLANG_DIFFUSION_SERVER_DEV_MODE` | false | Enable dev-only HTTP endpoints for debugging |
| `SGLANG_DIFFUSION_TORCH_PROFILER_DIR` | not set | Directory for torch profiler traces (absolute path). Enables profiling when set |
| `SGLANG_DIFFUSION_CACHE_ROOT` | `~/.cache/sgl_diffusion` | Root directory for cache files |
| `SGLANG_DIFFUSION_CONFIG_ROOT` | `~/.config/sgl_diffusion` | Root directory for configuration files |
| `SGLANG_DIFFUSION_LOGGING_LEVEL` | `INFO` | Default logging level |
| `SGLANG_DIFFUSION_WORKER_MULTIPROC_METHOD` | `fork` | Multiprocess context for workers (`fork` or `spawn`) |
| `SGLANG_USE_RUNAI_MODEL_STREAMER` | true | Use Run:AI model streamer for model loading |
## Platform-Specific
### Apple MPS
| Environment Variable | Default | Description |
|----------------------|---------|--------------------------------------------------------------|
| `SGLANG_USE_MLX` | not set | Set to `1` to enable MLX fused Metal kernels for norm ops on MPS |
### ROCm (AMD GPUs)
| Environment Variable | Default | Description |
|----------------------|---------|-------------|
| `SGLANG_USE_ROCM_VAE` | false | Use AITer GroupNorm in VAE for improved performance on ROCm |
| `SGLANG_USE_ROCM_CUDNN_BENCHMARK` | false | Enable MIOpen auto-tuning for VAE conv layers on ROCm |
### Quantization
| Environment Variable | Default | Description |
|----------------------|---------|-------------|
| `SGLANG_DIFFUSION_FLASHINFER_FP4_GEMM_BACKEND` | not set | FlashInfer FP4 GEMM backend for generic NVFP4 fallback |
## Caching Acceleration
These variables configure caching acceleration for Diffusion Transformer (DiT) models.
SGLang supports multiple caching strategies - see [caching documentation](performance/cache/index.md) for an overview.
### Cache-DiT Configuration
See [cache-dit documentation](performance/cache/cache_dit.md) for detailed configuration.
| Environment Variable | Default | Description |
|-------------------------------------|---------|------------------------------------------|
| `SGLANG_CACHE_DIT_ENABLED` | false | Enable Cache-DiT acceleration |
| `SGLANG_CACHE_DIT_FN` | 1 | First N blocks to always compute |
| `SGLANG_CACHE_DIT_BN` | 0 | Last N blocks to always compute |
| `SGLANG_CACHE_DIT_WARMUP` | 4 | Warmup steps before caching |
| `SGLANG_CACHE_DIT_RDT` | 0.24 | Residual difference threshold |
| `SGLANG_CACHE_DIT_MC` | 3 | Max continuous cached steps |
| `SGLANG_CACHE_DIT_TAYLORSEER` | false | Enable TaylorSeer calibrator |
| `SGLANG_CACHE_DIT_TS_ORDER` | 1 | TaylorSeer order (1 or 2) |
| `SGLANG_CACHE_DIT_SCM_PRESET` | none | SCM preset (none/slow/medium/fast/ultra) |
| `SGLANG_CACHE_DIT_SCM_POLICY` | dynamic | SCM caching policy |
| `SGLANG_CACHE_DIT_SCM_COMPUTE_BINS` | not set | Custom SCM compute bins |
| `SGLANG_CACHE_DIT_SCM_CACHE_BINS` | not set | Custom SCM cache bins |
### Cache-DiT Secondary Transformer
For dual-transformer models (e.g., Wan2.2 with high/low-noise experts), these variables configure caching for the secondary transformer. Each falls back to its primary counterpart if not set.
| Environment Variable | Default | Description |
|-------------------------------------|---------|------------------------------------------|
| `SGLANG_CACHE_DIT_SECONDARY_FN` | (from primary) | First N blocks to always compute |
| `SGLANG_CACHE_DIT_SECONDARY_BN` | (from primary) | Last N blocks to always compute |
| `SGLANG_CACHE_DIT_SECONDARY_WARMUP` | (from primary) | Warmup steps before caching |
| `SGLANG_CACHE_DIT_SECONDARY_RDT` | (from primary) | Residual difference threshold |
| `SGLANG_CACHE_DIT_SECONDARY_MC` | (from primary) | Max continuous cached steps |
| `SGLANG_CACHE_DIT_SECONDARY_TAYLORSEER` | (from primary) | Enable TaylorSeer calibrator |
| `SGLANG_CACHE_DIT_SECONDARY_TS_ORDER` | (from primary) | TaylorSeer order (1 or 2) |
## Cloud Storage
These variables configure S3-compatible cloud storage for automatically uploading generated images and videos.
| Environment Variable | Default | Description |
|---------------------------------|---------|--------------------------------------------------------|
| `SGLANG_CLOUD_STORAGE_TYPE` | not set | Set to `s3` to enable cloud storage |
| `SGLANG_S3_BUCKET_NAME` | not set | The name of the S3 bucket |
| `SGLANG_S3_ENDPOINT_URL` | not set | Custom endpoint URL (for MinIO, OSS, etc.) |
| `SGLANG_S3_REGION_NAME` | us-east-1 | AWS region name |
| `SGLANG_S3_ACCESS_KEY_ID` | not set | AWS Access Key ID |
| `SGLANG_S3_SECRET_ACCESS_KEY` | not set | AWS Secret Access Key |
## CUDA Crash Debugging
These variables enable kernel API logging and optional input/output dumps around diffusion CUDA kernel call boundaries. They are useful when tracking down CUDA crashes such as illegal memory access, device-side assert, or shape mismatches in custom kernels.
| Environment Variable | Default | Description |
|----------------------|---------|-------------|
| `SGLANG_KERNEL_API_LOGLEVEL` | `0` | Controls crash-debug kernel API logging. `1` logs API names, `3` logs tensor metadata, `5` adds tensor statistics, and `10` also writes dump snapshots. |
| `SGLANG_KERNEL_API_LOGDEST` | `stdout` | Destination for crash-debug kernel API logs. Use `stdout`, `stderr`, or a file path. `%i` is replaced with the process PID. |
| `SGLANG_KERNEL_API_DUMP_DIR` | `sglang_kernel_api_dumps` | Output directory for level-10 kernel API dumps. `%i` is replaced with the process PID. |
| `SGLANG_KERNEL_API_DUMP_INCLUDE` | not set | Comma-separated wildcard patterns for kernel API names to include in level-10 dumps. |
| `SGLANG_KERNEL_API_DUMP_EXCLUDE` | not set | Comma-separated wildcard patterns for kernel API names to exclude from level-10 dumps. |