sglang/docs_new/docs/sglang-diffusion/api/openai-api.mdx

---
title: OpenAI API
sidebarTitle: OpenAI API
description: Image and video generation endpoints with LoRA adapter management.
---

The SGLang Diffusion HTTP server implements an OpenAI-compatible API for image and video generation, as well as dynamic LoRA adapter management.

## Prerequisites

- Python 3.11+ if you plan to use the OpenAI Python SDK.
- A running SGLang Diffusion server (see the [CLI reference](./cli) for launch instructions).

## Start the server

```bash
SERVER_ARGS=(
  --model-path Wan-AI/Wan2.1-T2V-1.3B-Diffusers
  --text-encoder-cpu-offload
  --pin-cpu-memory
  --num-gpus 4
  --ulysses-degree=2
  --ring-degree=2
  --port 30010
)

sglang serve "${SERVER_ARGS[@]}"
```

- `--model-path` -- path to the model or HuggingFace model ID
- `--port` -- HTTP port to listen on (default: `30000`)

### Get model information

**Endpoint:** `GET /models`

Returns model path, task type, pipeline configuration, and precision settings.

<CodeGroup>
```bash curl
curl -sS -X GET "http://localhost:30010/models"
```
</CodeGroup>

**Response:**

```json
{
  "model_path": "Wan-AI/Wan2.1-T2V-1.3B-Diffusers",
  "task_type": "T2V",
  "pipeline_name": "wan_pipeline",
  "pipeline_class": "WanPipeline",
  "num_gpus": 4,
  "dit_precision": "bf16",
  "vae_precision": "fp16"
}
```

---

## Image generation

The server implements an OpenAI-compatible Images API under the `/v1/images` namespace.

### Create an image

**Endpoint:** `POST /v1/images/generations`

<CodeGroup>
```python Python
import base64
from openai import OpenAI

client = OpenAI(api_key="sk-proj-1234567890", base_url="http://localhost:30010/v1")

img = client.images.generate(
    prompt="A calico cat playing a piano on stage",
    size="1024x1024",
    n=1,
    response_format="b64_json",
)

image_bytes = base64.b64decode(img.data[0].b64_json)
with open("output.png", "wb") as f:
    f.write(image_bytes)
```

```bash curl
curl -sS -X POST "http://localhost:30010/v1/images/generations" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-proj-1234567890" \
  -d '{
        "prompt": "A calico cat playing a piano on stage",
        "size": "1024x1024",
        "n": 1,
        "response_format": "b64_json"
      }'
```
</CodeGroup>

<Note>
If `response_format=url` is used and cloud storage is not configured, the API returns a relative URL like `/v1/images/<IMAGE_ID>/content`.
</Note>

### Edit an image

**Endpoint:** `POST /v1/images/edits`

Accepts a multipart form upload with input images and a text prompt. Returns either a base64-encoded image or a URL.

<Tabs>
  <Tab title="b64_json response">
    ```bash
    curl -sS -X POST "http://localhost:30010/v1/images/edits" \
      -H "Authorization: Bearer sk-proj-1234567890" \
      -F "image=@local_input_image.png" \
      -F "url=image_url.jpg" \
      -F "prompt=A calico cat playing a piano on stage" \
      -F "size=1024x1024" \
      -F "response_format=b64_json"
    ```
  </Tab>
  <Tab title="URL response">
    ```bash
    curl -sS -X POST "http://localhost:30010/v1/images/edits" \
      -H "Authorization: Bearer sk-proj-1234567890" \
      -F "image=@local_input_image.png" \
      -F "url=image_url.jpg" \
      -F "prompt=A calico cat playing a piano on stage" \
      -F "size=1024x1024" \
      -F "response_format=url"
    ```
  </Tab>
</Tabs>

### Download image content

When `response_format=url` is used, the API returns a relative URL like `/v1/images/<IMAGE_ID>/content`.

**Endpoint:** `GET /v1/images/{image_id}/content`

```bash
curl -sS -L "http://localhost:30010/v1/images/<IMAGE_ID>/content" \
  -H "Authorization: Bearer sk-proj-1234567890" \
  -o output.png
```

---

## Video generation

The server implements a subset of the OpenAI Videos API under the `/v1/videos` namespace.

### Create a video

**Endpoint:** `POST /v1/videos`

<CodeGroup>
```python Python
from openai import OpenAI

client = OpenAI(api_key="sk-proj-1234567890", base_url="http://localhost:30010/v1")

video = client.videos.create(
    prompt="A calico cat playing a piano on stage",
    size="1280x720"
)
print(f"Video ID: {video.id}, Status: {video.status}")
```

```bash curl
curl -sS -X POST "http://localhost:30010/v1/videos" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-proj-1234567890" \
  -d '{
        "prompt": "A calico cat playing a piano on stage",
        "size": "1280x720"
      }'
```
</CodeGroup>

### List videos

**Endpoint:** `GET /v1/videos`

<CodeGroup>
```python Python
videos = client.videos.list()
for item in videos.data:
    print(item.id, item.status)
```

```bash curl
curl -sS -X GET "http://localhost:30010/v1/videos" \
  -H "Authorization: Bearer sk-proj-1234567890"
```
</CodeGroup>

### Download video content

**Endpoint:** `GET /v1/videos/{video_id}/content`

<CodeGroup>
```python Python
import time

# Poll for completion
while True:
    page = client.videos.list()
    item = next((v for v in page.data if v.id == video_id), None)
    if item and item.status == "completed":
        break
    time.sleep(5)

# Download content
resp = client.videos.download_content(video_id=video_id)
with open("output.mp4", "wb") as f:
    f.write(resp.read())
```

```bash curl
curl -sS -L "http://localhost:30010/v1/videos/<VIDEO_ID>/content" \
  -H "Authorization: Bearer sk-proj-1234567890" \
  -o output.mp4
```
</CodeGroup>

---

## LoRA management

The server supports dynamic loading, merging, and unmerging of LoRA adapters.

<Info>
- **Mutual exclusion:** Only one LoRA can be merged (active) at a time.
- **Switching:** To switch LoRAs, you must first unmerge the current one, then set the new one.
- **Caching:** The server caches loaded LoRA weights in memory. Switching back to a previously loaded LoRA (same path) has negligible cost.
</Info>

### Set LoRA adapter

Loads one or more LoRA adapters and merges their weights into the model. Supports both single LoRA (backward compatible) and multiple LoRA adapters.

**Endpoint:** `POST /v1/set_lora`

**Parameters:**

| Parameter | Type | Description |
|:--|:--|:--|
| `lora_nickname` | string or list | A unique identifier for the LoRA adapter(s). Required |
| `lora_path` | string or list | Path to `.safetensors` file(s) or HuggingFace repo ID(s). Required for first load; optional when re-activating a cached nickname |
| `target` | string or list | Which transformer(s) to apply the LoRA to: `"all"` (default), `"transformer"`, `"transformer_2"`, `"critic"` |
| `strength` | float or list | LoRA strength for merge (default: `1.0`). Values < 1.0 reduce the effect, > 1.0 amplify it |

<Tabs>
  <Tab title="Single LoRA">
    ```bash
    curl -X POST http://localhost:30010/v1/set_lora \
      -H "Content-Type: application/json" \
      -d '{
            "lora_nickname": "lora_name",
            "lora_path": "/path/to/lora.safetensors",
            "target": "all",
            "strength": 0.8
          }'
    ```
  </Tab>
  <Tab title="Multiple LoRAs">
    ```bash
    curl -X POST http://localhost:30010/v1/set_lora \
      -H "Content-Type: application/json" \
      -d '{
            "lora_nickname": ["lora_1", "lora_2"],
            "lora_path": ["/path/to/lora1.safetensors", "/path/to/lora2.safetensors"],
            "target": ["transformer", "transformer_2"],
            "strength": [0.8, 1.0]
          }'
    ```
  </Tab>
  <Tab title="Same target">
    ```bash
    curl -X POST http://localhost:30010/v1/set_lora \
      -H "Content-Type: application/json" \
      -d '{
            "lora_nickname": ["style_lora", "character_lora"],
            "lora_path": ["/path/to/style.safetensors", "/path/to/character.safetensors"],
            "target": "all",
            "strength": [0.7, 0.9]
          }'
    ```
  </Tab>
</Tabs>

<Note>
When using multiple LoRAs:
- All list parameters (`lora_nickname`, `lora_path`, `target`, `strength`) must have the same length.
- If `target` or `strength` is a single value, it will be applied to all LoRAs.
- Multiple LoRAs applied to the same target will be merged in order.
</Note>

### Merge LoRA weights

Manually merges the currently set LoRA weights into the base model.

**Endpoint:** `POST /v1/merge_lora_weights`

| Parameter | Type | Description |
|:--|:--|:--|
| `target` | string | Which transformer(s) to merge: `"all"` (default), `"transformer"`, `"transformer_2"`, `"critic"` |
| `strength` | float | LoRA strength for merge (default: `1.0`) |

```bash
curl -X POST http://localhost:30010/v1/merge_lora_weights \
  -H "Content-Type: application/json" \
  -d '{"strength": 0.8}'
```

<Tip>
`set_lora` automatically performs a merge, so this endpoint is typically only needed if you have manually unmerged but want to re-apply the same LoRA without calling `set_lora` again.
</Tip>

### Unmerge LoRA weights

Unmerges the currently active LoRA weights from the base model, restoring it to its original state. Call this before setting a different LoRA.

**Endpoint:** `POST /v1/unmerge_lora_weights`

```bash
curl -X POST http://localhost:30010/v1/unmerge_lora_weights \
  -H "Content-Type: application/json"
```

### List LoRA adapters

Returns loaded LoRA adapters and current application status per module.

**Endpoint:** `GET /v1/list_loras`

```bash
curl -sS -X GET "http://localhost:30010/v1/list_loras"
```

**Response:**

```json
{
  "loaded_adapters": [
    { "nickname": "lora_a", "path": "/weights/lora_a.safetensors" },
    { "nickname": "lora_b", "path": "/weights/lora_b.safetensors" }
  ],
  "active": {
    "transformer": [
      {
        "nickname": "lora2",
        "path": "tarn59/pixel_art_style_lora_z_image_turbo",
        "merged": true,
        "strength": 1.0
      }
    ]
  }
}
```

### Example: switching LoRAs

1. **Set LoRA A**

```bash
curl -X POST http://localhost:30010/v1/set_lora \
  -d '{"lora_nickname": "lora_a", "lora_path": "path/to/A"}'
```

2. **Generate with LoRA A**

Run your image or video generation requests.

3. **Unmerge LoRA A**

```bash
curl -X POST http://localhost:30010/v1/unmerge_lora_weights
```

4. **Set LoRA B**

```bash
curl -X POST http://localhost:30010/v1/set_lora \
  -d '{"lora_nickname": "lora_b", "lora_path": "path/to/B"}'
```

5. **Generate with LoRA B**

Run your image or video generation requests with the new adapter.

---

## Output quality

Control output quality and compression for both image and video generation through the `output-quality` and `output-compression` parameters.

### Parameters

| Parameter | Type | Description |
|:--|:--|:--|
| `output-quality` | string | Preset quality level. Default: `"default"` |
| `output-compression` | integer | Direct compression level override (0-100). When provided, takes precedence over `output-quality` |

**Quality presets:**

| Preset | Compression value |
|:--|:--|
| `"maximum"` | 100 |
| `"high"` | 90 |
| `"medium"` | 55 |
| `"low"` | 35 |
| `"default"` | Auto (50 for video, 75 for image) |

<Warning>
- When both `output-quality` and `output-compression` are provided, `output-compression` takes precedence.
- Quality settings apply to JPEG and video formats. PNG uses lossless compression and ignores these settings.
- Lower compression values (or `"low"` quality preset) produce smaller files but may show visible artifacts.
</Warning>