mirror of
https://github.com/kvcache-ai/sglang.git
synced 2026-06-30 19:57:52 +00:00
Co-authored-by: AdityaVKochar <adityavardhankochar@gmail.com> Co-authored-by: mintlify[bot] <109931778+mintlify[bot]@users.noreply.github.com> Co-authored-by: adhyan-jain <adhyanjain2006@gmail.com> Co-authored-by: Adhyan Jain <71976554+adhyan-jain@users.noreply.github.com> Co-authored-by: Maitri-shah29 <maitrirajivshah@gmail.com> Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com> Co-authored-by: Maitri Shah <shah29maitri@gmail.com> Co-authored-by: Aditya Vardhan Kochar <80113212+AdityaVKochar@users.noreply.github.com> Co-authored-by: Rishit Shivam <164783543+pokymono@users.noreply.github.com> Co-authored-by: Rishitshivam <164783543+Rishitshivam@users.noreply.github.com> Co-authored-by: IshhanKheria <ishhankheria06@gmail.com> Co-authored-by: Ishita Joshi <ishitata.joshi@gmail.com> Co-authored-by: Richard Chen <104477092+Richardczl98@users.noreply.github.com> Co-authored-by: longGGGGGG <553746008@qq.com> Co-authored-by: Richard <richardchen@radixark.ai> Co-authored-by: Nakul Sinha <nakul.new4socials@gmail.com> Co-authored-by: Divyam Agrawal <ludicrouslytrue@gmail.com> Co-authored-by: Richardczl98 <Zhenlinc@stanford.edu> Co-authored-by: Krishang Zinzuwadia <krishangzinzuwadia@gmail.com> Co-authored-by: nimeshas <nimesha.s106@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Jignas Paturu <86356085+JignasP@users.noreply.github.com> Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
88 lines
2.3 KiB
Plaintext
88 lines
2.3 KiB
Plaintext
---
|
|
title: Supported models
|
|
description: See which families of SGLang-compatible models are actively maintained.
|
|
mode: wide
|
|
---
|
|
|
|
SGLang supports model families across text generation, retrieval, and reward workflows. Browse the sections below for the primary product paths and jump to the detail pages when you are ready to explore a specific class.
|
|
|
|
### Text generation
|
|
|
|
<CardGroup cols={3}>
|
|
<Card
|
|
title="Large language models"
|
|
mode="card"
|
|
className="max-w-sm mx-auto"
|
|
href="./supported-models/large-language-models"
|
|
img="/cards/LLM-card.png"
|
|
>
|
|
Production-tuned Llama and Qwen families validated for high-throughput
|
|
serving.
|
|
</Card>
|
|
<Card
|
|
title="Vision language models"
|
|
mode="card"
|
|
className="max-w-sm mx-auto"
|
|
href="./supported-models/vision-language-models"
|
|
img="/cards/VLM-card.png"
|
|
>
|
|
Vision-text hybrids that stay responsive on multi-GPU setups.
|
|
</Card>
|
|
<Card
|
|
title="Diffusion language models"
|
|
mode="card"
|
|
className="max-w-sm mx-auto"
|
|
href="./sglang-diffusion/index"
|
|
img="/cards/dLLM-card.png"
|
|
>
|
|
Score-based and diffusion backbones for structured text generation
|
|
workflows.
|
|
</Card>
|
|
</CardGroup>
|
|
|
|
### Retrieval and ranking
|
|
|
|
<CardGroup cols={3}>
|
|
<Card
|
|
title="Embedding models"
|
|
mode="card"
|
|
className="max-w-sm mx-auto"
|
|
href="./supported-models/embedding-models"
|
|
img="/cards/Embedding-card.png"
|
|
>
|
|
Dense and sparse embeddings optimized with FlashInfer kernels.
|
|
</Card>
|
|
<Card
|
|
title="Rerank models"
|
|
mode="card"
|
|
className="max-w-sm mx-auto"
|
|
href="./supported-models/rerank-models"
|
|
img="/cards/Rerank-card.png"
|
|
>
|
|
Low-latency rerankers for multi-stage retrieval pipelines.
|
|
</Card>
|
|
<Card
|
|
title="Classification models"
|
|
mode="card"
|
|
className="max-w-sm mx-auto"
|
|
href="./supported-models/classification-models"
|
|
img="/cards/Classification-card.png"
|
|
>
|
|
Lightweight classifiers covering safety, intent, and context filters.
|
|
</Card>
|
|
</CardGroup>
|
|
|
|
### Specialized models
|
|
|
|
<CardGroup cols={3}>
|
|
<Card
|
|
title="Reward models"
|
|
mode="card"
|
|
className="max-w-sm mx-auto"
|
|
href="./supported-models/reward-models"
|
|
img="/cards/Reward-card.png"
|
|
>
|
|
RLHF and reward scoring pipelines optimized for production latency.
|
|
</Card>
|
|
</CardGroup>
|