mirror of
https://github.com/kvcache-ai/sglang.git
synced 2026-06-30 19:57:52 +00:00
Co-authored-by: AdityaVKochar <adityavardhankochar@gmail.com> Co-authored-by: mintlify[bot] <109931778+mintlify[bot]@users.noreply.github.com> Co-authored-by: adhyan-jain <adhyanjain2006@gmail.com> Co-authored-by: Adhyan Jain <71976554+adhyan-jain@users.noreply.github.com> Co-authored-by: Maitri-shah29 <maitrirajivshah@gmail.com> Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com> Co-authored-by: Maitri Shah <shah29maitri@gmail.com> Co-authored-by: Aditya Vardhan Kochar <80113212+AdityaVKochar@users.noreply.github.com> Co-authored-by: Rishit Shivam <164783543+pokymono@users.noreply.github.com> Co-authored-by: Rishitshivam <164783543+Rishitshivam@users.noreply.github.com> Co-authored-by: IshhanKheria <ishhankheria06@gmail.com> Co-authored-by: Ishita Joshi <ishitata.joshi@gmail.com> Co-authored-by: Richard Chen <104477092+Richardczl98@users.noreply.github.com> Co-authored-by: longGGGGGG <553746008@qq.com> Co-authored-by: Richard <richardchen@radixark.ai> Co-authored-by: Nakul Sinha <nakul.new4socials@gmail.com> Co-authored-by: Divyam Agrawal <ludicrouslytrue@gmail.com> Co-authored-by: Richardczl98 <Zhenlinc@stanford.edu> Co-authored-by: Krishang Zinzuwadia <krishangzinzuwadia@gmail.com> Co-authored-by: nimeshas <nimesha.s106@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Jignas Paturu <86356085+JignasP@users.noreply.github.com> Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
39 lines
1.7 KiB
Plaintext
39 lines
1.7 KiB
Plaintext
---
|
|
title: "Observability"
|
|
metatags:
|
|
description: "SGLang observability: Prometheus metrics, request logging, request dump and replay, crash dump debugging."
|
|
---
|
|
## Production Metrics
|
|
SGLang exposes the following metrics via Prometheus. You can enable them by adding `--enable-metrics` when launching the server.
|
|
You can query them by:
|
|
```bash Command
|
|
curl http://localhost:30000/metrics
|
|
```
|
|
|
|
See [Production Metrics](../references/production_metrics) and [Production Request Tracing](../references/production_request_trace) for more details.
|
|
|
|
## Logging
|
|
|
|
By default, SGLang does not log any request contents. You can log them by using `--log-requests`.
|
|
You can control the verbosity by using `--log-request-level`.
|
|
See [Logging](./server_arguments#logging) for more details.
|
|
|
|
## Request Dump and Replay
|
|
|
|
You can dump all requests and replay them later for benchmarking or other purposes.
|
|
|
|
To start dumping, use the following command to send a request to a server:
|
|
```bash Command
|
|
python3 -m sglang.srt.managers.configure_logging --url http://localhost:30000 --dump-requests-folder /tmp/sglang_request_dump --dump-requests-threshold 100
|
|
```
|
|
The server will dump the requests into a pickle file for every 100 requests.
|
|
|
|
To replay the request dump, use `scripts/playground/replay_request_dump.py`.
|
|
|
|
## Crash Dump and Replay
|
|
Sometimes the server might crash, and you may want to debug the cause of the crash.
|
|
SGLang supports crash dumping, which will dump all requests from the 5 minutes before the crash, allowing you to replay the requests and debug the reason later.
|
|
|
|
To enable crash dumping, use `--crash-dump-folder /tmp/crash_dump`.
|
|
To replay the crash dump, use `scripts/playground/replay_request_dump.py`.
|