--- title: "Ollama-Compatible API" metatags: description: "SGLang provides Ollama API compatibility, allowing you to use the Ollama CLI and Python library with SGLang as the inference backend." --- SGLang provides Ollama API compatibility, allowing you to use the Ollama CLI and Python library with SGLang as the inference backend. ## Prerequisites ```bash Command # Install the Ollama Python library (for Python client usage) pip install ollama ``` You don't need the Ollama server installed - SGLang acts as the backend. You only need the `ollama` CLI or Python library as the client. ## Endpoints
Endpoint Method Description
`/` GET, HEAD Health check for Ollama CLI
`/api/tags` GET List available models
`/api/chat` POST Chat completions (streaming & non-streaming)
`/api/generate` POST Text generation (streaming & non-streaming)
`/api/show` POST Model information
## Quick Start ### 1. Launch SGLang Server ```bash Command python -m sglang.launch_server \ --model Qwen/Qwen2.5-1.5B-Instruct \ --port 30001 \ --host 0.0.0.0 ``` The model name used with `ollama run` must match exactly what you passed to `--model`. ### 2. Use Ollama CLI ```bash Command # List available models OLLAMA_HOST=http://localhost:30001 ollama list # Interactive chat OLLAMA_HOST=http://localhost:30001 ollama run "Qwen/Qwen2.5-1.5B-Instruct" ``` If connecting to a remote server behind a firewall: ```bash Command # SSH tunnel ssh -L 30001:localhost:30001 user@gpu-server -N & # Then use Ollama CLI as above OLLAMA_HOST=http://localhost:30001 ollama list ``` ### 3. Use Ollama Python Library ```python Example import ollama client = ollama.Client(host='http://localhost:30001') # Non-streaming response = client.chat( model='Qwen/Qwen2.5-1.5B-Instruct', messages=[{'role': 'user', 'content': 'Hello!'}] ) print(response['message']['content']) # Streaming stream = client.chat( model='Qwen/Qwen2.5-1.5B-Instruct', messages=[{'role': 'user', 'content': 'Tell me a story'}], stream=True ) for chunk in stream: print(chunk['message']['content'], end='', flush=True) ``` ## Smart Router For intelligent routing between local Ollama (fast) and remote SGLang (powerful) using an LLM judge, see the [Smart Router documentation](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/entrypoints/ollama/README). ## Summary
Component Purpose
**Ollama API** Familiar CLI/API that developers already know
**SGLang Backend** High-performance inference engine
**Smart Router** Intelligent routing - fast local for simple tasks, powerful remote for complex tasks