mirror of
https://github.com/kvcache-ai/sglang.git
synced 2026-07-02 21:37:11 +00:00
Co-authored-by: AdityaVKochar <adityavardhankochar@gmail.com> Co-authored-by: mintlify[bot] <109931778+mintlify[bot]@users.noreply.github.com> Co-authored-by: adhyan-jain <adhyanjain2006@gmail.com> Co-authored-by: Adhyan Jain <71976554+adhyan-jain@users.noreply.github.com> Co-authored-by: Maitri-shah29 <maitrirajivshah@gmail.com> Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com> Co-authored-by: Maitri Shah <shah29maitri@gmail.com> Co-authored-by: Aditya Vardhan Kochar <80113212+AdityaVKochar@users.noreply.github.com> Co-authored-by: Rishit Shivam <164783543+pokymono@users.noreply.github.com> Co-authored-by: Rishitshivam <164783543+Rishitshivam@users.noreply.github.com> Co-authored-by: IshhanKheria <ishhankheria06@gmail.com> Co-authored-by: Ishita Joshi <ishitata.joshi@gmail.com> Co-authored-by: Richard Chen <104477092+Richardczl98@users.noreply.github.com> Co-authored-by: longGGGGGG <553746008@qq.com> Co-authored-by: Richard <richardchen@radixark.ai> Co-authored-by: Nakul Sinha <nakul.new4socials@gmail.com> Co-authored-by: Divyam Agrawal <ludicrouslytrue@gmail.com> Co-authored-by: Richardczl98 <Zhenlinc@stanford.edu> Co-authored-by: Krishang Zinzuwadia <krishangzinzuwadia@gmail.com> Co-authored-by: nimeshas <nimesha.s106@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Jignas Paturu <86356085+JignasP@users.noreply.github.com> Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
422 lines
11 KiB
Plaintext
422 lines
11 KiB
Plaintext
---
|
|
title: OpenAI API
|
|
sidebarTitle: OpenAI API
|
|
description: Image and video generation endpoints with LoRA adapter management.
|
|
---
|
|
|
|
The SGLang Diffusion HTTP server implements an OpenAI-compatible API for image and video generation, as well as dynamic LoRA adapter management.
|
|
|
|
## Prerequisites
|
|
|
|
- Python 3.11+ if you plan to use the OpenAI Python SDK.
|
|
- A running SGLang Diffusion server (see the [CLI reference](./cli) for launch instructions).
|
|
|
|
## Start the server
|
|
|
|
```bash
|
|
SERVER_ARGS=(
|
|
--model-path Wan-AI/Wan2.1-T2V-1.3B-Diffusers
|
|
--text-encoder-cpu-offload
|
|
--pin-cpu-memory
|
|
--num-gpus 4
|
|
--ulysses-degree=2
|
|
--ring-degree=2
|
|
--port 30010
|
|
)
|
|
|
|
sglang serve "${SERVER_ARGS[@]}"
|
|
```
|
|
|
|
- `--model-path` -- path to the model or HuggingFace model ID
|
|
- `--port` -- HTTP port to listen on (default: `30000`)
|
|
|
|
### Get model information
|
|
|
|
**Endpoint:** `GET /models`
|
|
|
|
Returns model path, task type, pipeline configuration, and precision settings.
|
|
|
|
<CodeGroup>
|
|
```bash curl
|
|
curl -sS -X GET "http://localhost:30010/models"
|
|
```
|
|
</CodeGroup>
|
|
|
|
**Response:**
|
|
|
|
```json
|
|
{
|
|
"model_path": "Wan-AI/Wan2.1-T2V-1.3B-Diffusers",
|
|
"task_type": "T2V",
|
|
"pipeline_name": "wan_pipeline",
|
|
"pipeline_class": "WanPipeline",
|
|
"num_gpus": 4,
|
|
"dit_precision": "bf16",
|
|
"vae_precision": "fp16"
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Image generation
|
|
|
|
The server implements an OpenAI-compatible Images API under the `/v1/images` namespace.
|
|
|
|
### Create an image
|
|
|
|
**Endpoint:** `POST /v1/images/generations`
|
|
|
|
<CodeGroup>
|
|
```python Python
|
|
import base64
|
|
from openai import OpenAI
|
|
|
|
client = OpenAI(api_key="sk-proj-1234567890", base_url="http://localhost:30010/v1")
|
|
|
|
img = client.images.generate(
|
|
prompt="A calico cat playing a piano on stage",
|
|
size="1024x1024",
|
|
n=1,
|
|
response_format="b64_json",
|
|
)
|
|
|
|
image_bytes = base64.b64decode(img.data[0].b64_json)
|
|
with open("output.png", "wb") as f:
|
|
f.write(image_bytes)
|
|
```
|
|
|
|
```bash curl
|
|
curl -sS -X POST "http://localhost:30010/v1/images/generations" \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer sk-proj-1234567890" \
|
|
-d '{
|
|
"prompt": "A calico cat playing a piano on stage",
|
|
"size": "1024x1024",
|
|
"n": 1,
|
|
"response_format": "b64_json"
|
|
}'
|
|
```
|
|
</CodeGroup>
|
|
|
|
<Note>
|
|
If `response_format=url` is used and cloud storage is not configured, the API returns a relative URL like `/v1/images/<IMAGE_ID>/content`.
|
|
</Note>
|
|
|
|
### Edit an image
|
|
|
|
**Endpoint:** `POST /v1/images/edits`
|
|
|
|
Accepts a multipart form upload with input images and a text prompt. Returns either a base64-encoded image or a URL.
|
|
|
|
<Tabs>
|
|
<Tab title="b64_json response">
|
|
```bash
|
|
curl -sS -X POST "http://localhost:30010/v1/images/edits" \
|
|
-H "Authorization: Bearer sk-proj-1234567890" \
|
|
-F "image=@local_input_image.png" \
|
|
-F "url=image_url.jpg" \
|
|
-F "prompt=A calico cat playing a piano on stage" \
|
|
-F "size=1024x1024" \
|
|
-F "response_format=b64_json"
|
|
```
|
|
</Tab>
|
|
<Tab title="URL response">
|
|
```bash
|
|
curl -sS -X POST "http://localhost:30010/v1/images/edits" \
|
|
-H "Authorization: Bearer sk-proj-1234567890" \
|
|
-F "image=@local_input_image.png" \
|
|
-F "url=image_url.jpg" \
|
|
-F "prompt=A calico cat playing a piano on stage" \
|
|
-F "size=1024x1024" \
|
|
-F "response_format=url"
|
|
```
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
### Download image content
|
|
|
|
When `response_format=url` is used, the API returns a relative URL like `/v1/images/<IMAGE_ID>/content`.
|
|
|
|
**Endpoint:** `GET /v1/images/{image_id}/content`
|
|
|
|
```bash
|
|
curl -sS -L "http://localhost:30010/v1/images/<IMAGE_ID>/content" \
|
|
-H "Authorization: Bearer sk-proj-1234567890" \
|
|
-o output.png
|
|
```
|
|
|
|
---
|
|
|
|
## Video generation
|
|
|
|
The server implements a subset of the OpenAI Videos API under the `/v1/videos` namespace.
|
|
|
|
### Create a video
|
|
|
|
**Endpoint:** `POST /v1/videos`
|
|
|
|
<CodeGroup>
|
|
```python Python
|
|
from openai import OpenAI
|
|
|
|
client = OpenAI(api_key="sk-proj-1234567890", base_url="http://localhost:30010/v1")
|
|
|
|
video = client.videos.create(
|
|
prompt="A calico cat playing a piano on stage",
|
|
size="1280x720"
|
|
)
|
|
print(f"Video ID: {video.id}, Status: {video.status}")
|
|
```
|
|
|
|
```bash curl
|
|
curl -sS -X POST "http://localhost:30010/v1/videos" \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer sk-proj-1234567890" \
|
|
-d '{
|
|
"prompt": "A calico cat playing a piano on stage",
|
|
"size": "1280x720"
|
|
}'
|
|
```
|
|
</CodeGroup>
|
|
|
|
### List videos
|
|
|
|
**Endpoint:** `GET /v1/videos`
|
|
|
|
<CodeGroup>
|
|
```python Python
|
|
videos = client.videos.list()
|
|
for item in videos.data:
|
|
print(item.id, item.status)
|
|
```
|
|
|
|
```bash curl
|
|
curl -sS -X GET "http://localhost:30010/v1/videos" \
|
|
-H "Authorization: Bearer sk-proj-1234567890"
|
|
```
|
|
</CodeGroup>
|
|
|
|
### Download video content
|
|
|
|
**Endpoint:** `GET /v1/videos/{video_id}/content`
|
|
|
|
<CodeGroup>
|
|
```python Python
|
|
import time
|
|
|
|
# Poll for completion
|
|
while True:
|
|
page = client.videos.list()
|
|
item = next((v for v in page.data if v.id == video_id), None)
|
|
if item and item.status == "completed":
|
|
break
|
|
time.sleep(5)
|
|
|
|
# Download content
|
|
resp = client.videos.download_content(video_id=video_id)
|
|
with open("output.mp4", "wb") as f:
|
|
f.write(resp.read())
|
|
```
|
|
|
|
```bash curl
|
|
curl -sS -L "http://localhost:30010/v1/videos/<VIDEO_ID>/content" \
|
|
-H "Authorization: Bearer sk-proj-1234567890" \
|
|
-o output.mp4
|
|
```
|
|
</CodeGroup>
|
|
|
|
---
|
|
|
|
## LoRA management
|
|
|
|
The server supports dynamic loading, merging, and unmerging of LoRA adapters.
|
|
|
|
<Info>
|
|
- **Mutual exclusion:** Only one LoRA can be merged (active) at a time.
|
|
- **Switching:** To switch LoRAs, you must first unmerge the current one, then set the new one.
|
|
- **Caching:** The server caches loaded LoRA weights in memory. Switching back to a previously loaded LoRA (same path) has negligible cost.
|
|
</Info>
|
|
|
|
### Set LoRA adapter
|
|
|
|
Loads one or more LoRA adapters and merges their weights into the model. Supports both single LoRA (backward compatible) and multiple LoRA adapters.
|
|
|
|
**Endpoint:** `POST /v1/set_lora`
|
|
|
|
**Parameters:**
|
|
|
|
| Parameter | Type | Description |
|
|
|:--|:--|:--|
|
|
| `lora_nickname` | string or list | A unique identifier for the LoRA adapter(s). Required |
|
|
| `lora_path` | string or list | Path to `.safetensors` file(s) or HuggingFace repo ID(s). Required for first load; optional when re-activating a cached nickname |
|
|
| `target` | string or list | Which transformer(s) to apply the LoRA to: `"all"` (default), `"transformer"`, `"transformer_2"`, `"critic"` |
|
|
| `strength` | float or list | LoRA strength for merge (default: `1.0`). Values < 1.0 reduce the effect, > 1.0 amplify it |
|
|
|
|
<Tabs>
|
|
<Tab title="Single LoRA">
|
|
```bash
|
|
curl -X POST http://localhost:30010/v1/set_lora \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"lora_nickname": "lora_name",
|
|
"lora_path": "/path/to/lora.safetensors",
|
|
"target": "all",
|
|
"strength": 0.8
|
|
}'
|
|
```
|
|
</Tab>
|
|
<Tab title="Multiple LoRAs">
|
|
```bash
|
|
curl -X POST http://localhost:30010/v1/set_lora \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"lora_nickname": ["lora_1", "lora_2"],
|
|
"lora_path": ["/path/to/lora1.safetensors", "/path/to/lora2.safetensors"],
|
|
"target": ["transformer", "transformer_2"],
|
|
"strength": [0.8, 1.0]
|
|
}'
|
|
```
|
|
</Tab>
|
|
<Tab title="Same target">
|
|
```bash
|
|
curl -X POST http://localhost:30010/v1/set_lora \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"lora_nickname": ["style_lora", "character_lora"],
|
|
"lora_path": ["/path/to/style.safetensors", "/path/to/character.safetensors"],
|
|
"target": "all",
|
|
"strength": [0.7, 0.9]
|
|
}'
|
|
```
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
<Note>
|
|
When using multiple LoRAs:
|
|
- All list parameters (`lora_nickname`, `lora_path`, `target`, `strength`) must have the same length.
|
|
- If `target` or `strength` is a single value, it will be applied to all LoRAs.
|
|
- Multiple LoRAs applied to the same target will be merged in order.
|
|
</Note>
|
|
|
|
### Merge LoRA weights
|
|
|
|
Manually merges the currently set LoRA weights into the base model.
|
|
|
|
**Endpoint:** `POST /v1/merge_lora_weights`
|
|
|
|
| Parameter | Type | Description |
|
|
|:--|:--|:--|
|
|
| `target` | string | Which transformer(s) to merge: `"all"` (default), `"transformer"`, `"transformer_2"`, `"critic"` |
|
|
| `strength` | float | LoRA strength for merge (default: `1.0`) |
|
|
|
|
```bash
|
|
curl -X POST http://localhost:30010/v1/merge_lora_weights \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"strength": 0.8}'
|
|
```
|
|
|
|
<Tip>
|
|
`set_lora` automatically performs a merge, so this endpoint is typically only needed if you have manually unmerged but want to re-apply the same LoRA without calling `set_lora` again.
|
|
</Tip>
|
|
|
|
### Unmerge LoRA weights
|
|
|
|
Unmerges the currently active LoRA weights from the base model, restoring it to its original state. Call this before setting a different LoRA.
|
|
|
|
**Endpoint:** `POST /v1/unmerge_lora_weights`
|
|
|
|
```bash
|
|
curl -X POST http://localhost:30010/v1/unmerge_lora_weights \
|
|
-H "Content-Type: application/json"
|
|
```
|
|
|
|
### List LoRA adapters
|
|
|
|
Returns loaded LoRA adapters and current application status per module.
|
|
|
|
**Endpoint:** `GET /v1/list_loras`
|
|
|
|
```bash
|
|
curl -sS -X GET "http://localhost:30010/v1/list_loras"
|
|
```
|
|
|
|
**Response:**
|
|
|
|
```json
|
|
{
|
|
"loaded_adapters": [
|
|
{ "nickname": "lora_a", "path": "/weights/lora_a.safetensors" },
|
|
{ "nickname": "lora_b", "path": "/weights/lora_b.safetensors" }
|
|
],
|
|
"active": {
|
|
"transformer": [
|
|
{
|
|
"nickname": "lora2",
|
|
"path": "tarn59/pixel_art_style_lora_z_image_turbo",
|
|
"merged": true,
|
|
"strength": 1.0
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
### Example: switching LoRAs
|
|
|
|
1. **Set LoRA A**
|
|
|
|
```bash
|
|
curl -X POST http://localhost:30010/v1/set_lora \
|
|
-d '{"lora_nickname": "lora_a", "lora_path": "path/to/A"}'
|
|
```
|
|
|
|
2. **Generate with LoRA A**
|
|
|
|
Run your image or video generation requests.
|
|
|
|
3. **Unmerge LoRA A**
|
|
|
|
```bash
|
|
curl -X POST http://localhost:30010/v1/unmerge_lora_weights
|
|
```
|
|
|
|
4. **Set LoRA B**
|
|
|
|
```bash
|
|
curl -X POST http://localhost:30010/v1/set_lora \
|
|
-d '{"lora_nickname": "lora_b", "lora_path": "path/to/B"}'
|
|
```
|
|
|
|
5. **Generate with LoRA B**
|
|
|
|
Run your image or video generation requests with the new adapter.
|
|
|
|
---
|
|
|
|
## Output quality
|
|
|
|
Control output quality and compression for both image and video generation through the `output-quality` and `output-compression` parameters.
|
|
|
|
### Parameters
|
|
|
|
| Parameter | Type | Description |
|
|
|:--|:--|:--|
|
|
| `output-quality` | string | Preset quality level. Default: `"default"` |
|
|
| `output-compression` | integer | Direct compression level override (0-100). When provided, takes precedence over `output-quality` |
|
|
|
|
**Quality presets:**
|
|
|
|
| Preset | Compression value |
|
|
|:--|:--|
|
|
| `"maximum"` | 100 |
|
|
| `"high"` | 90 |
|
|
| `"medium"` | 55 |
|
|
| `"low"` | 35 |
|
|
| `"default"` | Auto (50 for video, 75 for image) |
|
|
|
|
<Warning>
|
|
- When both `output-quality` and `output-compression` are provided, `output-compression` takes precedence.
|
|
- Quality settings apply to JPEG and video formats. PNG uses lossless compression and ignores these settings.
|
|
- Lower compression values (or `"low"` quality preset) produce smaller files but may show visible artifacts.
|
|
</Warning>
|