Files
sglang/docs_new/docs/basic_usage/openai_api_embeddings.mdx
Mingyi a3291b5654 Add new Mintlify documentation site (docs_new/) (#23001)
Co-authored-by: AdityaVKochar <adityavardhankochar@gmail.com>
Co-authored-by: mintlify[bot] <109931778+mintlify[bot]@users.noreply.github.com>
Co-authored-by: adhyan-jain <adhyanjain2006@gmail.com>
Co-authored-by: Adhyan Jain <71976554+adhyan-jain@users.noreply.github.com>
Co-authored-by: Maitri-shah29 <maitrirajivshah@gmail.com>
Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com>
Co-authored-by: Maitri Shah <shah29maitri@gmail.com>
Co-authored-by: Aditya Vardhan Kochar <80113212+AdityaVKochar@users.noreply.github.com>
Co-authored-by: Rishit Shivam <164783543+pokymono@users.noreply.github.com>
Co-authored-by: Rishitshivam <164783543+Rishitshivam@users.noreply.github.com>
Co-authored-by: IshhanKheria <ishhankheria06@gmail.com>
Co-authored-by: Ishita Joshi <ishitata.joshi@gmail.com>
Co-authored-by: Richard Chen <104477092+Richardczl98@users.noreply.github.com>
Co-authored-by: longGGGGGG <553746008@qq.com>
Co-authored-by: Richard <richardchen@radixark.ai>
Co-authored-by: Nakul Sinha <nakul.new4socials@gmail.com>
Co-authored-by: Divyam Agrawal <ludicrouslytrue@gmail.com>
Co-authored-by: Richardczl98 <Zhenlinc@stanford.edu>
Co-authored-by: Krishang Zinzuwadia <krishangzinzuwadia@gmail.com>
Co-authored-by: nimeshas <nimesha.s106@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jignas Paturu <86356085+JignasP@users.noreply.github.com>
Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
2026-04-20 15:10:22 -07:00

127 lines
3.1 KiB
Plaintext

---
title: "OpenAI APIs - Embedding"
metatags:
description: "This tutorial covers the embedding APIs for embedding models."
---
SGLang provides OpenAI-compatible APIs to enable a smooth transition from OpenAI services to self-hosted local models.
A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/embeddings).
This tutorial covers the embedding APIs for embedding models. For a list of the supported models see the [corresponding overview page](../supported-models)
## Launch A Server
Launch the server in your terminal and wait for it to initialize. Remember to add `--is-embedding` to the command.
```python Example
from sglang.test.doc_patch import launch_server_cmd
from sglang.utils import wait_for_server, print_highlight, terminate_process
embedding_process, port = launch_server_cmd(
"""
python3 -m sglang.launch_server --model-path Alibaba-NLP/gte-Qwen2-1.5B-instruct \
--host 0.0.0.0 --is-embedding --log-level warning
"""
)
wait_for_server(f"http://localhost:{port}")
```
## Using cURL
```python Example
import subprocess, json
text = "Once upon a time"
curl_text = f"""curl -s http://localhost:{port}/v1/embeddings \
-H "Content-Type: application/json" \
-d '{{"model": "Alibaba-NLP/gte-Qwen2-1.5B-instruct", "input": "{text}"}}'"""
result = subprocess.check_output(curl_text, shell=True)
print(result)
text_embedding = json.loads(result)["data"][0]["embedding"]
print_highlight(f"Text embedding (first 10): {text_embedding[:10]}")
```
## Using Python Requests
```python Example
import requests
text = "Once upon a time"
response = requests.post(
f"http://localhost:{port}/v1/embeddings",
json={"model": "Alibaba-NLP/gte-Qwen2-1.5B-instruct", "input": text},
)
text_embedding = response.json()["data"][0]["embedding"]
print_highlight(f"Text embedding (first 10): {text_embedding[:10]}")
```
## Using OpenAI Python Client
```python Example
import openai
client = openai.Client(base_url=f"http://127.0.0.1:{port}/v1", api_key="None")
# Text embedding example
response = client.embeddings.create(
model="Alibaba-NLP/gte-Qwen2-1.5B-instruct",
input=text,
)
embedding = response.data[0].embedding[:10]
print_highlight(f"Text embedding (first 10): {embedding}")
```
## Using Input IDs
SGLang also supports `input_ids` as input to get the embedding.
```python Example
import json
import os
from transformers import AutoTokenizer
os.environ["TOKENIZERS_PARALLELISM"] = "false"
tokenizer = AutoTokenizer.from_pretrained("Alibaba-NLP/gte-Qwen2-1.5B-instruct")
input_ids = tokenizer.encode(text)
curl_ids = f"""curl -s http://localhost:{port}/v1/embeddings \
-H "Content-Type: application/json" \
-d '{{"model": "Alibaba-NLP/gte-Qwen2-1.5B-instruct", "input": {json.dumps(input_ids)}}}'"""
input_ids_embedding = json.loads(subprocess.check_output(curl_ids, shell=True))["data"][
0
]["embedding"]
print_highlight(f"Input IDs embedding (first 10): {input_ids_embedding[:10]}")
```
```python Example
terminate_process(embedding_process)
```
## Multi-Modal Embedding Model
Please refer to [Multi-Modal Embedding Model](../supported-models)