mirror of
https://github.com/kvcache-ai/sglang.git
synced 2026-06-30 19:57:52 +00:00
Co-authored-by: AdityaVKochar <adityavardhankochar@gmail.com> Co-authored-by: mintlify[bot] <109931778+mintlify[bot]@users.noreply.github.com> Co-authored-by: adhyan-jain <adhyanjain2006@gmail.com> Co-authored-by: Adhyan Jain <71976554+adhyan-jain@users.noreply.github.com> Co-authored-by: Maitri-shah29 <maitrirajivshah@gmail.com> Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com> Co-authored-by: Maitri Shah <shah29maitri@gmail.com> Co-authored-by: Aditya Vardhan Kochar <80113212+AdityaVKochar@users.noreply.github.com> Co-authored-by: Rishit Shivam <164783543+pokymono@users.noreply.github.com> Co-authored-by: Rishitshivam <164783543+Rishitshivam@users.noreply.github.com> Co-authored-by: IshhanKheria <ishhankheria06@gmail.com> Co-authored-by: Ishita Joshi <ishitata.joshi@gmail.com> Co-authored-by: Richard Chen <104477092+Richardczl98@users.noreply.github.com> Co-authored-by: longGGGGGG <553746008@qq.com> Co-authored-by: Richard <richardchen@radixark.ai> Co-authored-by: Nakul Sinha <nakul.new4socials@gmail.com> Co-authored-by: Divyam Agrawal <ludicrouslytrue@gmail.com> Co-authored-by: Richardczl98 <Zhenlinc@stanford.edu> Co-authored-by: Krishang Zinzuwadia <krishangzinzuwadia@gmail.com> Co-authored-by: nimeshas <nimesha.s106@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Jignas Paturu <86356085+JignasP@users.noreply.github.com> Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
150 lines
9.7 KiB
Plaintext
150 lines
9.7 KiB
Plaintext
---
|
|
title: "Contribution Guide"
|
|
mode: wide
|
|
metatags:
|
|
description: "SGLang contribution guide: source install, pre-commit, unit tests, CI triggers, code style, sgl-kernel updates."
|
|
---
|
|
Welcome to **SGLang**! We appreciate your interest in contributing. This guide provides a concise overview of how to set up your environment, run tests, build documentation, and open a Pull Request (PR). Whether you're fixing a small bug or developing a major feature, we encourage following these steps for a smooth contribution process.
|
|
|
|
## Install SGLang from Source
|
|
|
|
### Fork and clone the repository
|
|
|
|
**Note**: New contributors do **not** have the write permission to push to the official SGLang repo. Please fork the repository under your GitHub account, then clone your fork locally.
|
|
|
|
```bash
|
|
git clone https://github.com/<your_user_name>/sglang.git
|
|
```
|
|
|
|
### Build from source
|
|
|
|
Refer to [Install SGLang from Source](../get-started/installation).
|
|
|
|
## Format code with pre-commit
|
|
|
|
We use [pre-commit](https://pre-commit.com/) to maintain consistent code style checks. Before pushing your changes, please run:
|
|
|
|
```bash
|
|
pip3 install pre-commit
|
|
pre-commit install
|
|
pre-commit run --all-files
|
|
```
|
|
|
|
- **`pre-commit run --all-files`** manually runs all configured checks, applying fixes if possible. If it fails the first time, re-run it to ensure lint errors are fully resolved. Make sure your code passes all checks **before** creating a Pull Request.
|
|
- **Do not commit** directly to the `main` branch. Always create a new branch (e.g., `feature/my-new-feature`), push your changes, and open a PR from that branch.
|
|
|
|
## Run and add unit tests
|
|
|
|
If you add a new feature or fix a bug, please add corresponding unit tests to ensure coverage and prevent regression.
|
|
SGLang uses Python's built-in [unittest](https://docs.python.org/3/library/unittest.html) framework.
|
|
For detailed instructions on running tests and integrating them into CI, refer to [test/README](https://github.com/sgl-project/sglang/tree/main/test/README).
|
|
|
|
## Write documentations
|
|
|
|
We recommend new contributors start from writing documentation, which helps you quickly understand SGLang codebase.
|
|
For more details, please refer to [docs/README](https://github.com/sgl-project/sglang/blob/main/docs/README.md).
|
|
|
|
## Test the accuracy
|
|
If your code changes the model output, please run the accuracy tests. A quick sanity check is the few-shot GSM8K.
|
|
|
|
```text Output
|
|
# Launch a server
|
|
python3 -m sglang.launch_server --model Qwen/Qwen2-7B-Instruct
|
|
|
|
# Evaluate
|
|
python3 -m sglang.test.few_shot_gsm8k --num-questions 200
|
|
```
|
|
|
|
Please note that the above script is primarily a sanity check, not a rigorous accuracy or speed test.
|
|
This test can have significant variance (1%-5%) in accuracy due to batching and the non-deterministic nature of the inference engine.
|
|
Also, do not rely on the "Latency/Output throughput" from this script, as it is not a proper speed test.
|
|
|
|
GSM8K is too easy for state-of-the-art models nowadays. Please try your own more challenging accuracy tests.
|
|
You can find additional accuracy eval examples in:
|
|
- [test_eval_accuracy_large.py](https://github.com/sgl-project/sglang/blob/main/test/srt/test_eval_accuracy_large.py)
|
|
- [test_gpt_oss_1gpu.py](https://github.com/sgl-project/sglang/blob/main/test/srt/test_gpt_oss_1gpu.py)
|
|
|
|
## Benchmark the speed
|
|
Refer to [Benchmark and Profiling](../developer_guide/benchmark_and_profiling).
|
|
|
|
## Requesting a review for merge
|
|
You can follow the pull request merge process described in [MAINTAINER](https://github.com/sgl-project/sglang/blob/main/.github/MAINTAINER).
|
|
You will need to work with the Merge Oncall, Codeowner, and other reviewers to get their approvals.
|
|
Then your PR can be merged.
|
|
|
|
## How to Trigger CI Tests
|
|
|
|
We have a lot of open PRs but limited CI machines, so only top and trusted contributors have permission to trigger CI tests.
|
|
Users with permission are listed in the [CI_PERMISSIONS.json](https://github.com/sgl-project/sglang/blob/main/.github/CI_PERMISSIONS.json)
|
|
|
|
For CI to run on a pull request, it must have the "run-ci" label. Authorized users can add the label or rerun failed tests by commenting on the PR with one of these commands:
|
|
|
|
- `/tag-run-ci-label`: Adds the "run-ci" label. Every future commit will trigger CI.
|
|
- `/rerun-failed-ci`: Reruns the failed or flaky tests from the most recent commit.
|
|
- `/tag-and-rerun-ci`: A single command that performs both `/tag-run-ci-label` and `/rerun-failed-ci`.
|
|
- `/rerun-stage <stage-name>`: Reruns a specific test stage without waiting for its dependencies. This is useful when you want to quickly validate a fix for a specific test failure instead of waiting ~30 minutes for preceding stages to complete.
|
|
|
|
If you have permission, the [Slash Command Handler](https://github.com/sgl-project/sglang/actions/workflows/slash-command-handler.yml) will run your command and react with a +1 to your comment. It may take up to a few minutes for the reaction to appear. Here's a usage [example](https://github.com/sgl-project/sglang/pull/14253#issuecomment-3599509302).
|
|
|
|
To avoid spamming a PR with too many `/rerun-failed-ci` comments, you can also trigger the command by editing an existing comment and adding any suffix (e.g., `/rerun-failed-ci try again`).
|
|
|
|
Example of rerunning a single test stage: `/rerun-stage unit-test-backend-4-gpu`.
|
|
|
|
If you don't have permission, please ask maintainers to trigger CI for you.
|
|
|
|
### CI rate limits
|
|
|
|
Due to CI scheduling and limited resources, higher-priority PRs may preempt running jobs. In such cases, you may need to rerun the tests.
|
|
|
|
We apply CI rate limits to prevent abuse and ensure fair usage of our CI resources.
|
|
|
|
Each CI workflow has a default limit defined in its workflow configuration file. For example, in [pr-gate.yml](https://github.com/sgl-project/sglang/blob/main/.github/workflows/pr-gate.yml), the default cooldown period is 120 minutes, and each workflow can override it via the `cool-down-minutes` input parameter:
|
|
|
|
```yaml Config
|
|
cool-down-minutes:
|
|
description: "Default cooldown period in minutes; 0 disables rate limiting"
|
|
type: number
|
|
default: 120
|
|
```
|
|
|
|
Users listed in [CI_PERMISSIONS.json](https://github.com/sgl-project/sglang/blob/main/.github/CI_PERMISSIONS.json) may have a per-user cooldown interval. In practice, we use the minimum of the workflow's default window and the user-specific interval.
|
|
|
|
|
|
## Code style guidance
|
|
- Avoid code duplication. If the same code snippet (more than five lines) appears multiple times, extract it into a shared function.
|
|
- Minimize device synchronization. Reduce expensive CPU-GPU synchronization operations, such as `tensor.item()` or `tensor.cpu()`, whenever possible. Use vectorized code.
|
|
- Prioritize extreme efficiency. SGLang is a runtime, and most of your code runs on the critical path for every request. Optimize all minor overheads as much as possible, especially in the model forward code.
|
|
- A common pattern is some runtime checks in the model forward pass (e.g., [this](https://github.com/sgl-project/sglang/blob/f1b0eda55c2c4838e8ab90a0fac7fb1e3d7064ab/python/sglang/srt/models/deepseek_v2.py#L486-L491)). These are very likely the same for every layer. Please cache the result as a single boolean value whenever possible.
|
|
- Make functions as pure as possible. Avoid in-place modification of arguments.
|
|
- Keep files concise. If a file exceeds 2,000 lines of code, split it into multiple smaller files. (e.g., `scheduler.py`, `scheduler_output_processor_mixin.py`)
|
|
- Keep tests run fast.
|
|
- If a single test file run longer than 500 seconds, split it into multiple smaller files (e.g., `test_eagle_infer_a.py`, `test_eagle_infer_b.py`).
|
|
- If a single job in a github workflow runs longer than 30 mins, split it into smaller jobs/steps.
|
|
- Reuse server launches in your unit tests to make tests run faster.
|
|
- When supporting new hardware or features, follow these guidelines:
|
|
- Do not drastically change existing code.
|
|
- Always prefer new files to introduce specific components for your new hardware (e.g., `allocator_ascend.py`).
|
|
- If you write multiple if/else blocks for new features, ensure the common path (e.g., NVIDIA hardware or the existing code path) is the first branch.
|
|
|
|
## How to update sgl-kernel
|
|
Since sglang and sgl-kernel are separate Python packages, our current GitHub CI infrastructure does not support updating a kernel and using it immediately within the same pull request (PR).
|
|
To add a new kernel or modify an existing one in the sgl-kernel package, you must use multiple PRs.
|
|
|
|
Follow these steps:
|
|
|
|
1. Submit a PR to update the sgl-kernel source code without using it in sglang python package (e.g., [#8884](https://github.com/sgl-project/sglang/pull/8884/files)).
|
|
2. Bump the version of sgl-kernel (e.g., [#9220](https://github.com/sgl-project/sglang/pull/9220/files)).
|
|
- Once merged, this will trigger an automatic release of the sgl-kernel wheel to PyPI.
|
|
- If not urgent, you can wait for other people to release the wheel. A new version will typically be released within one week.
|
|
3. Apply the changes:
|
|
- Update the sgl-kernel version in `sglang/python/pyproject.toml` to use the modified kernels.
|
|
- Update the related caller code in the sglang to use the new kernel.
|
|
|
|
## Tips for newcomers
|
|
|
|
If you want to contribute but don't have a specific idea in mind, pick issues labeled ["good first issue" or "help wanted"](https://github.com/sgl-project/sglang/issues?q=is%3Aissue+label%3A%22good+first+issue%22%2C%22help+wanted%22). These tasks typically have lower complexity and provide an excellent introduction to the codebase. Also check out this [code walk-through](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/tree/main/sglang/code-walk-through) for a deeper look into SGLang's workflow.
|
|
|
|
If you have any questions or want to start a discussion, please feel free to ask in our [Slack channel](https://slack.sglang.io).
|
|
|
|
Thank you for your interest in SGLang. Happy coding!
|