mirror of
https://github.com/kvcache-ai/sglang.git
synced 2026-06-30 19:57:52 +00:00
Co-authored-by: AdityaVKochar <adityavardhankochar@gmail.com> Co-authored-by: mintlify[bot] <109931778+mintlify[bot]@users.noreply.github.com> Co-authored-by: adhyan-jain <adhyanjain2006@gmail.com> Co-authored-by: Adhyan Jain <71976554+adhyan-jain@users.noreply.github.com> Co-authored-by: Maitri-shah29 <maitrirajivshah@gmail.com> Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com> Co-authored-by: Maitri Shah <shah29maitri@gmail.com> Co-authored-by: Aditya Vardhan Kochar <80113212+AdityaVKochar@users.noreply.github.com> Co-authored-by: Rishit Shivam <164783543+pokymono@users.noreply.github.com> Co-authored-by: Rishitshivam <164783543+Rishitshivam@users.noreply.github.com> Co-authored-by: IshhanKheria <ishhankheria06@gmail.com> Co-authored-by: Ishita Joshi <ishitata.joshi@gmail.com> Co-authored-by: Richard Chen <104477092+Richardczl98@users.noreply.github.com> Co-authored-by: longGGGGGG <553746008@qq.com> Co-authored-by: Richard <richardchen@radixark.ai> Co-authored-by: Nakul Sinha <nakul.new4socials@gmail.com> Co-authored-by: Divyam Agrawal <ludicrouslytrue@gmail.com> Co-authored-by: Richardczl98 <Zhenlinc@stanford.edu> Co-authored-by: Krishang Zinzuwadia <krishangzinzuwadia@gmail.com> Co-authored-by: nimeshas <nimesha.s106@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Jignas Paturu <86356085+JignasP@users.noreply.github.com> Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
182 lines
8.6 KiB
Plaintext
182 lines
8.6 KiB
Plaintext
---
|
||
title: "GPT OSS Usage"
|
||
metatags:
|
||
description: "Deploy GPT-OSS with SGLang: OpenAI Responses API compatible, built-in tools for web search and Python execution, reasoning levels, MCP tool server support."
|
||
---
|
||
Please refer to [#8833](https://github.com/sgl-project/sglang/issues/8833).
|
||
|
||
## Responses API & Built-in Tools
|
||
|
||
### Responses API
|
||
|
||
GPT‑OSS is compatible with the OpenAI Responses API. Use `client.responses.create(...)` with `model`, `instructions`, `input`, and optional `tools` to enable built‑in tool use. You can set reasoning level via `instructions`, e.g., "Reasoning: high" (also supports "medium" and "low") — levels: low (fast), medium (balanced), high (deep).
|
||
|
||
### Built-in Tools
|
||
|
||
GPT‑OSS can call built‑in tools for web search and Python execution. You can use the demo tool server or connect to external MCP tool servers.
|
||
|
||
#### Python Tool
|
||
|
||
- Executes short Python snippets for calculations, parsing, and quick scripts.
|
||
- By default runs in a Docker-based sandbox. To run on the host, set `PYTHON_EXECUTION_BACKEND=UV` (this executes model-generated code locally; use with care).
|
||
- Ensure Docker is available if you are not using the UV backend. It is recommended to run `docker pull python:3.11` in advance.
|
||
|
||
#### Web Search Tool
|
||
|
||
- Uses the Exa backend for web search.
|
||
- Requires an Exa API key; set `EXA_API_KEY` in your environment. Create a key at `https://exa.ai`.
|
||
|
||
### Tool & Reasoning Parser
|
||
|
||
- We support OpenAI Reasoning and Tool Call parser, as well as our SGLang native api for tool call and reasoning. Refer to [reasoning parser](../advanced_features/separate_reasoning) and [tool call parser](../advanced_features/tool_parser) for more details.
|
||
|
||
|
||
## Notes
|
||
|
||
- Use **Python 3.12** for the demo tools. And install the required `gpt-oss` packages.
|
||
- The default demo integrates the web search tool (Exa backend) and a demo Python interpreter via Docker.
|
||
- For search, set `EXA_API_KEY`. For Python execution, either have Docker available or set `PYTHON_EXECUTION_BACKEND=UV`.
|
||
|
||
Examples:
|
||
```bash Command
|
||
export EXA_API_KEY=YOUR_EXA_KEY
|
||
# Optional: run Python tool locally instead of Docker (use with care)
|
||
export PYTHON_EXECUTION_BACKEND=UV
|
||
```
|
||
|
||
Launch the server with the demo tool server:
|
||
|
||
```bash Command
|
||
python3 -m sglang.launch_server \
|
||
--model-path openai/gpt-oss-120b \
|
||
--tool-server demo \
|
||
--tp 2
|
||
```
|
||
|
||
For production usage, sglang can act as an MCP client for multiple services. An [example tool server](https://github.com/openai/gpt-oss/tree/main/gpt-oss-mcp-server) is provided. Start the servers and point sglang to them:
|
||
```bash Command
|
||
mcp run -t sse browser_server.py:mcp
|
||
mcp run -t sse python_server.py:mcp
|
||
|
||
python -m sglang.launch_server ... --tool-server ip-1:port-1,ip-2:port-2
|
||
```
|
||
The URLs should be MCP SSE servers that expose server information and well-documented tools. These tools are added to the system prompt so the model can use them.
|
||
|
||
## Speculative Decoding
|
||
|
||
SGLang supports speculative decoding for GPT-OSS models using EAGLE3 algorithm. This can significantly improve decoding speed, especially for small batch sizes.
|
||
|
||
**Usage**:
|
||
Add `--speculative-algorithm EAGLE3` along with the draft model path.
|
||
```bash Command
|
||
python3 -m sglang.launch_server \
|
||
--model-path openai/gpt-oss-120b \
|
||
--speculative-algorithm EAGLE3 \
|
||
--speculative-draft-model-path lmsys/EAGLE3-gpt-oss-120b-bf16 \
|
||
--tp 2
|
||
```
|
||
|
||
<Tip>
|
||
To enable the experimental overlap scheduler for EAGLE3 speculative decoding, set the environment variable `SGLANG_ENABLE_SPEC_V2=1`. This can improve performance by enabling overlap scheduling between draft and verification stages.
|
||
</Tip>
|
||
|
||
### Quick Demo
|
||
|
||
```python Example
|
||
from openai import OpenAI
|
||
|
||
client = OpenAI(
|
||
base_url="http://localhost:30000/v1",
|
||
api_key="sk-123456"
|
||
)
|
||
|
||
tools = [
|
||
{"type": "code_interpreter"},
|
||
{"type": "web_search_preview"},
|
||
]
|
||
|
||
# Reasoning level example
|
||
response = client.responses.create(
|
||
model="openai/gpt-oss-120b",
|
||
instructions="You are a helpful assistant."
|
||
reasoning_effort="high" # Supports high, medium, or low
|
||
input="In one sentence, explain the transformer architecture.",
|
||
)
|
||
print("====== reasoning: high ======")
|
||
print(response.output_text)
|
||
|
||
# Test python tool
|
||
response = client.responses.create(
|
||
model="openai/gpt-oss-120b",
|
||
instructions="You are a helfpul assistant, you could use python tool to execute code.",
|
||
input="Use python tool to calculate the sum of 29138749187 and 29138749187", # 58,277,498,374
|
||
tools=tools
|
||
)
|
||
print("====== test python tool ======")
|
||
print(response.output_text)
|
||
|
||
# Test browser tool
|
||
response = client.responses.create(
|
||
model="openai/gpt-oss-120b",
|
||
instructions="You are a helfpul assistant, you could use browser to search the web",
|
||
input="Search the web for the latest news about Nvidia stock price",
|
||
tools=tools
|
||
)
|
||
print("====== test browser tool ======")
|
||
print(response.output_text)
|
||
```
|
||
|
||
Example output:
|
||
```text Output
|
||
====== test python tool ======
|
||
The sum of 29,138,749,187 and 29,138,749,187 is **58,277,498,374**.
|
||
====== test browser tool ======
|
||
**Recent headlines on Nvidia (NVDA) stock**
|
||
|
||
<table style={{width: "100%", borderCollapse: "collapse", tableLayout: "fixed"}}>
|
||
<colgroup>
|
||
<col style={{width: "25%"}} />
|
||
<col style={{width: "25%"}} />
|
||
<col style={{width: "25%"}} />
|
||
<col style={{width: "25%"}} />
|
||
</colgroup>
|
||
<thead>
|
||
<tr style={{borderBottom: "2px solid #d55816"}}>
|
||
<th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.02)"}}>Date (2025)</th>
|
||
<th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.05)"}}>Source</th>
|
||
<th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.02)"}}>Key news points</th>
|
||
<th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.05)"}}>Stock‑price detail</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>**May 13**</td>
|
||
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Reuters</td>
|
||
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}>The market data page shows Nvidia trading “higher” at **$116.61** with no change from the previous close.</td>
|
||
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>**$116.61** – latest trade (delayed ≈ 15 min)【14†L34-L38】</td>
|
||
</tr>
|
||
<tr>
|
||
<td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>**Aug 18**</td>
|
||
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>CNBC</td>
|
||
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}>Morgan Stanley kept an **overweight** rating and lifted its price target to **$206** (up from $200), implying a 14 % upside from the Friday close. The firm notes Nvidia shares have already **jumped 34 % this year**.</td>
|
||
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>No exact price quoted, but the article signals strong upside expectations【9†L27-L31】</td>
|
||
</tr>
|
||
<tr>
|
||
<td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>**Aug 20**</td>
|
||
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>The Motley Fool</td>
|
||
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}>Nvidia is set to release its Q2 earnings on Aug 27. The article lists the **current price of $175.36**, down 0.16 % on the day (as of 3:58 p.m. ET).</td>
|
||
<td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>**$175.36** – current price on Aug 20【10†L12-L15】【10†L53-L57】</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
|
||
**What the news tells us**
|
||
|
||
* Nvidia’s share price has risen sharply this year – up roughly a third according to Morgan Stanley – and analysts are still raising targets (now $206).
|
||
* The most recent market quote (Reuters, May 13) was **$116.61**, but the stock has surged since then, reaching **$175.36** by mid‑August.
|
||
* Upcoming earnings on **Aug 27** are a focal point; both the Motley Fool and Morgan Stanley expect the results could keep the rally going.
|
||
|
||
**Bottom line:** Nvidia’s stock is on a strong upward trajectory in 2025, with price targets climbing toward $200‑$210 and the market price already near $175 as of late August.
|
||
|
||
```
|