Files
sglang/docs_new/docs/supported-models/classification-models.mdx
Mingyi a3291b5654 Add new Mintlify documentation site (docs_new/) (#23001)
Co-authored-by: AdityaVKochar <adityavardhankochar@gmail.com>
Co-authored-by: mintlify[bot] <109931778+mintlify[bot]@users.noreply.github.com>
Co-authored-by: adhyan-jain <adhyanjain2006@gmail.com>
Co-authored-by: Adhyan Jain <71976554+adhyan-jain@users.noreply.github.com>
Co-authored-by: Maitri-shah29 <maitrirajivshah@gmail.com>
Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com>
Co-authored-by: Maitri Shah <shah29maitri@gmail.com>
Co-authored-by: Aditya Vardhan Kochar <80113212+AdityaVKochar@users.noreply.github.com>
Co-authored-by: Rishit Shivam <164783543+pokymono@users.noreply.github.com>
Co-authored-by: Rishitshivam <164783543+Rishitshivam@users.noreply.github.com>
Co-authored-by: IshhanKheria <ishhankheria06@gmail.com>
Co-authored-by: Ishita Joshi <ishitata.joshi@gmail.com>
Co-authored-by: Richard Chen <104477092+Richardczl98@users.noreply.github.com>
Co-authored-by: longGGGGGG <553746008@qq.com>
Co-authored-by: Richard <richardchen@radixark.ai>
Co-authored-by: Nakul Sinha <nakul.new4socials@gmail.com>
Co-authored-by: Divyam Agrawal <ludicrouslytrue@gmail.com>
Co-authored-by: Richardczl98 <Zhenlinc@stanford.edu>
Co-authored-by: Krishang Zinzuwadia <krishangzinzuwadia@gmail.com>
Co-authored-by: nimeshas <nimesha.s106@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jignas Paturu <86356085+JignasP@users.noreply.github.com>
Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
2026-04-20 15:10:22 -07:00

324 lines
10 KiB
Plaintext

---
title: Classification Models
---
This document describes the `/v1/classify` API endpoint in SGLang, which is compatible with vLLM's classification API format.
## Overview
The classification API allows you to classify text inputs using classification models. This implementation follows the same format as vLLM's 0.7.0 classification API.
## API endpoint
```text Output
POST /v1/classify
```
## Request format
```json Config
{
"model": "model_name",
"input": "text to classify"
}
```
### Parameters
<ParamField body="model" type="string" required>
The name of the classification model to use.
</ParamField>
<ParamField body="input" type="string" required>
The text to classify.
</ParamField>
<ParamField body="user" type="string">
User identifier for tracking.
</ParamField>
<ParamField body="rid" type="string">
Request ID for tracking.
</ParamField>
<ParamField body="priority" type="integer">
Request priority.
</ParamField>
## Response format
```json Config
{
"id": "classify-9bf17f2847b046c7b2d5495f4b4f9682",
"object": "list",
"created": 1745383213,
"model": "jason9693/Qwen2.5-1.5B-apeach",
"data": [
{
"index": 0,
"label": "Default",
"probs": [0.565970778465271, 0.4340292513370514],
"num_classes": 2
}
],
"usage": {
"prompt_tokens": 10,
"total_tokens": 10,
"completion_tokens": 0,
"prompt_tokens_details": null
}
}
```
### Response fields
<ResponseField name="id" type="string" required>
Unique identifier for the classification request.
</ResponseField>
<ResponseField name="object" type="string" required>
Always `"list"`.
</ResponseField>
<ResponseField name="created" type="integer" required>
Unix timestamp when the request was created.
</ResponseField>
<ResponseField name="model" type="string" required>
The model used for classification.
</ResponseField>
<ResponseField name="data" type="object[]" required>
Array of classification results.
<Expandable title="data fields">
<ResponseField name="index" type="integer">
Index of the result.
</ResponseField>
<ResponseField name="label" type="string">
Predicted class label.
</ResponseField>
<ResponseField name="probs" type="number[]">
Array of probabilities for each class.
</ResponseField>
<ResponseField name="num_classes" type="integer">
Total number of classes.
</ResponseField>
</Expandable>
</ResponseField>
<ResponseField name="usage" type="object" required>
Token usage information.
<Expandable title="usage fields">
<ResponseField name="prompt_tokens" type="integer">
Number of input tokens.
</ResponseField>
<ResponseField name="total_tokens" type="integer">
Total number of tokens.
</ResponseField>
<ResponseField name="completion_tokens" type="integer">
Number of completion tokens (always `0` for classification).
</ResponseField>
<ResponseField name="prompt_tokens_details" type="object">
Additional token details (optional).
</ResponseField>
</Expandable>
</ResponseField>
## Example usage
<Tabs>
<Tab title="curl">
```bash Command
curl -v "http://127.0.0.1:8000/v1/classify" \
-H "Content-Type: application/json" \
-d '{
"model": "jason9693/Qwen2.5-1.5B-apeach",
"input": "Loved the new café—coffee was great."
}'
```
</Tab>
<Tab title="Python">
```python Example
import requests
import json
# Make classification request
response = requests.post(
"http://127.0.0.1:8000/v1/classify",
headers={"Content-Type": "application/json"},
json={
"model": "jason9693/Qwen2.5-1.5B-apeach",
"input": "Loved the new café—coffee was great."
}
)
# Parse response
result = response.json()
print(json.dumps(result, indent=2))
```
</Tab>
</Tabs>
## Supported models
The classification API works with any classification model supported by SGLang, including:
<Tabs>
<Tab title="Classification models (multi-class)">
<table style={{width: "100%", borderCollapse: "collapse", tableLayout: "fixed"}}>
<colgroup>
<col style={{width: "50.0%"}} />
<col style={{width: "50.0%"}} />
</colgroup>
<thead>
<tr style={{borderBottom: "2px solid #d55816"}}>
<th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.02)"}}>Model</th>
<th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.05)"}}>Type</th>
</tr>
</thead>
<tbody>
<tr>
<td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>`LlamaForSequenceClassification`</td>
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Multi-class classification</td>
</tr>
<tr>
<td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>`Qwen2ForSequenceClassification`</td>
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Multi-class classification</td>
</tr>
<tr>
<td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>`Qwen3ForSequenceClassification`</td>
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Multi-class classification</td>
</tr>
<tr>
<td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>`BertForSequenceClassification`</td>
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Multi-class classification</td>
</tr>
<tr>
<td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>`Gemma2ForSequenceClassification`</td>
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Multi-class classification</td>
</tr>
</tbody>
</table>
<Note>
The API automatically uses the `id2label` mapping from the model's `config.json` file to provide meaningful label names instead of generic class names. If `id2label` is not available, it falls back to `LABEL_0`, `LABEL_1`, etc., or `Class_0`, `Class_1` as a last resort.
</Note>
</Tab>
<Tab title="Reward models (single score)">
<table style={{width: "100%", borderCollapse: "collapse", tableLayout: "fixed"}}>
<colgroup>
<col style={{width: "50.0%"}} />
<col style={{width: "50.0%"}} />
</colgroup>
<thead>
<tr style={{borderBottom: "2px solid #d55816"}}>
<th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.02)"}}>Model</th>
<th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.05)"}}>Type</th>
</tr>
</thead>
<tbody>
<tr>
<td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>`InternLM2ForRewardModel`</td>
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Single reward score</td>
</tr>
<tr>
<td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>`Qwen2ForRewardModel`</td>
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Single reward score</td>
</tr>
<tr>
<td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>`LlamaForSequenceClassificationWithNormal_Weights`</td>
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Special reward model</td>
</tr>
</tbody>
</table>
<Info>
The `/classify` endpoint in SGLang was originally designed for reward models but now supports all non-generative models. The `/v1/classify` endpoint provides a standardized vLLM-compatible interface for classification tasks.
</Info>
</Tab>
</Tabs>
## Error handling
The API returns appropriate HTTP status codes and error messages:
<table style={{width: "100%", borderCollapse: "collapse", tableLayout: "fixed"}}>
<colgroup>
<col style={{width: "50%"}} />
<col style={{width: "50%"}} />
</colgroup>
<thead>
<tr style={{borderBottom: "2px solid #d55816"}}>
<th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.02)"}}>Status code</th>
<th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.05)"}}>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>`400 Bad Request`</td>
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Invalid request format or missing required fields</td>
</tr>
<tr>
<td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>`500 Internal Server Error`</td>
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Server-side processing error</td>
</tr>
</tbody>
</table>
Error response format:
```json Config
{
"error": "Error message",
"type": "error_type",
"code": 400
}
```
## Implementation details
<Accordion title="Rust model gateway">
Handles routing and request/response models in
`sgl-model-gateway/src/protocols/spec.rs`.
</Accordion>
<Accordion title="Python HTTP server">
Implements the actual endpoint in
`python/sglang/srt/entrypoints/http_server.py`.
</Accordion>
<Accordion title="Classification service">
Handles the classification logic in
`python/sglang/srt/entrypoints/openai/serving_classify.py`.
</Accordion>
## Testing
Use the provided test script to verify the implementation:
<CodeGroup>
```bash Command
python test_classify_api.py
```
</CodeGroup>
## Compatibility
<Check>
This implementation is compatible with vLLM's classification API format,
allowing seamless migration from vLLM to SGLang for classification tasks.
</Check>