mirror of
https://github.com/kvcache-ai/sglang.git
synced 2026-06-30 19:57:52 +00:00
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
185 lines
12 KiB
Markdown
185 lines
12 KiB
Markdown
# Contribution Guide
|
||
|
||
Welcome to **SGLang**! We appreciate your interest in contributing. This guide provides a concise overview of how to set up your environment, run tests, build documentation, and open a Pull Request (PR). Whether you’re fixing a small bug or developing a major feature, we encourage following these steps for a smooth contribution process.
|
||
|
||
## Install SGLang from Source
|
||
|
||
### Fork and clone the repository
|
||
|
||
**Note**: New contributors do **not** have the write permission to push to the official SGLang repo. Please fork the repository under your GitHub account, then clone your fork locally.
|
||
|
||
```bash
|
||
git clone https://github.com/<your_user_name>/sglang.git
|
||
```
|
||
|
||
### Build from source
|
||
|
||
Refer to [Install SGLang from Source](../get_started/install.md#method-2-from-source).
|
||
|
||
## Format code with pre-commit
|
||
|
||
We use [pre-commit](https://pre-commit.com/) to maintain consistent code style checks. Before pushing your changes, please run:
|
||
|
||
```bash
|
||
pip3 install pre-commit
|
||
pre-commit install
|
||
pre-commit run --all-files
|
||
```
|
||
|
||
- **`pre-commit run --all-files`** manually runs all configured checks, applying fixes if possible. If it fails the first time, re-run it to ensure lint errors are fully resolved. Make sure your code passes all checks **before** creating a Pull Request.
|
||
- **Do not commit** directly to the `main` branch. Always create a new branch (e.g., `feature/my-new-feature`), push your changes, and open a PR from that branch.
|
||
- Link checking with lychee is **enforced in CI**. By default, it is not blocking local commits.
|
||
- To run local link checks manually, use: `pre-commit run --hook-stage manual lychee --all-files`.
|
||
|
||
## Run and add unit tests
|
||
|
||
If you add a new feature or fix a bug, please add corresponding unit tests to ensure coverage and prevent regression.
|
||
|
||
### Unit tests (no server required)
|
||
|
||
Unit tests live under [`test/registered/unit/`](https://github.com/sgl-project/sglang/tree/main/test/registered/unit), organized to mirror the `python/sglang/srt/` source tree. These tests validate component logic **without** launching a server or loading real model weights.
|
||
SGLang uses Python's built-in [unittest](https://docs.python.org/3/library/unittest.html) framework with [pytest](https://docs.pytest.org/) as the test runner.
|
||
|
||
**When to add a unit test:** If you modify a file under `python/sglang/srt/`, check whether a corresponding test exists in `test/registered/unit/` and add coverage for your changes. For example:
|
||
|
||
```
|
||
srt/mem_cache/radix_cache.py → unit/mem_cache/test_radix_cache.py
|
||
srt/sampling/sampling_params.py → unit/sampling/test_sampling_params.py
|
||
```
|
||
|
||
**Run unit tests locally:**
|
||
|
||
```bash
|
||
pytest test/registered/unit/ -v # all unit tests
|
||
pytest test/registered/unit/mem_cache/ -v # one module
|
||
```
|
||
|
||
**Run with coverage:**
|
||
|
||
```bash
|
||
pytest test/registered/unit/ --cov --cov-config=.coveragerc -v
|
||
```
|
||
|
||
For conventions on CI registration, test structure, and examples, see [`test/registered/unit/README.md`](https://github.com/sgl-project/sglang/tree/main/test/registered/unit/README.md).
|
||
|
||
### E2E tests (server required)
|
||
|
||
For tests that require launching a server, refer to [`test/registered/README.md`](https://github.com/sgl-project/sglang/tree/main/test/registered/README.md) for guidance on where to place your test.
|
||
|
||
For detailed instructions on running tests and integrating them into CI, refer to [test/README.md](https://github.com/sgl-project/sglang/tree/main/test/README.md).
|
||
|
||
## Write documentations
|
||
|
||
We recommend new contributors start from writing documentation, which helps you quickly understand SGLang codebase.
|
||
For more details, please refer to [docs/README.md](https://github.com/sgl-project/sglang/tree/main/docs/README.md).
|
||
|
||
## Test the accuracy
|
||
If your code changes the model output, please run the accuracy tests. A quick sanity check is the few-shot GSM8K.
|
||
|
||
```
|
||
# Launch a server
|
||
python3 -m sglang.launch_server --model Qwen/Qwen2-7B-Instruct
|
||
|
||
# Evaluate
|
||
python3 -m sglang.test.few_shot_gsm8k --num-questions 200
|
||
```
|
||
|
||
Please note that the above script is primarily a sanity check, not a rigorous accuracy or speed test.
|
||
This test can have significant variance (1%–5%) in accuracy due to batching and the non-deterministic nature of the inference engine.
|
||
Also, do not rely on the "Latency/Output throughput" from this script, as it is not a proper speed test.
|
||
|
||
GSM8K is too easy for state-of-the-art models nowadays. Please try your own more challenging accuracy tests.
|
||
You can find additional accuracy eval examples in:
|
||
- [test_eval_accuracy_large.py](https://github.com/sgl-project/sglang/blob/main/test/registered/eval/test_eval_accuracy_large.py)
|
||
- [test_gpt_oss_1gpu.py](https://github.com/sgl-project/sglang/blob/main/test/registered/core/test_gpt_oss_1gpu.py)
|
||
|
||
## Benchmark the speed
|
||
Refer to [Benchmark and Profiling](../developer_guide/benchmark_and_profiling.md).
|
||
|
||
## Requesting a review for merge
|
||
You can follow the pull request merge process described in [MAINTAINER.md](https://github.com/sgl-project/sglang/blob/main/.github/MAINTAINER.md).
|
||
You will need to work with the Merge Oncall, Codeowner, and other reviewers to get their approvals.
|
||
Then your PR can be merged.
|
||
|
||
## How to Trigger CI Tests
|
||
|
||
We have a lot of open PRs but limited CI machines, so only top and trusted contributors have permission to trigger CI tests.
|
||
Users with permission are listed in the [CI_PERMISSIONS.json](https://github.com/sgl-project/sglang/blob/main/.github/CI_PERMISSIONS.json)
|
||
|
||
**PR authors** can always use `/rerun-failed-ci` on their own PRs, even if they are not listed in `CI_PERMISSIONS.json`.
|
||
|
||
For CI to run on a pull request, it must have the "run-ci" label. Authorized users can add the label or rerun failed tests by commenting on the PR with one of these commands:
|
||
|
||
- `/tag-run-ci-label`: Adds the "run-ci" label. Every future commit will trigger CI.
|
||
- `/rerun-failed-ci`: Reruns the failed or flaky tests from the most recent commit.
|
||
- `/tag-and-rerun-ci`: A single command that performs both `/tag-run-ci-label` and `/rerun-failed-ci`.
|
||
- `/rerun-stage <stage-name>`: Reruns a specific test stage without waiting for its dependencies. This is useful when you want to quickly validate a fix for a specific test failure instead of waiting ~30 minutes for preceding stages to complete.
|
||
|
||
If you have permission, the [Slash Command Handler](https://github.com/sgl-project/sglang/actions/workflows/slash-command-handler.yml) will run your command and react with a 👍 to your comment. It may take up to a few minutes for the reaction to appear. Here’s a usage [example](https://github.com/sgl-project/sglang/pull/14253#issuecomment-3599509302).
|
||
|
||
To avoid spamming a PR with too many `/rerun-failed-ci` comments, you can also trigger the command by editing an existing comment and adding any suffix (e.g., `/rerun-failed-ci try again`).
|
||
|
||
Example of rerunning a single test stage: `/rerun-stage unit-test-backend-4-gpu`.
|
||
|
||
If you don’t have permission and you’re not the PR author, please ask maintainers to trigger CI for you.
|
||
|
||
### CI rate limits
|
||
|
||
Due to CI scheduling and limited resources, higher-priority PRs may preempt running jobs. In such cases, you may need to rerun the tests.
|
||
We apply CI rate limits to prevent abuse and ensure fair usage of our CI resources.
|
||
|
||
Each CI workflow has a default limit defined in its workflow configuration file. For example, in [pr-gate.yml](https://github.com/sgl-project/sglang/blob/main/.github/workflows/pr-gate.yml), the default cooldown period is 120 minutes, and each workflow can override it via the `cool-down-minutes` input parameter:
|
||
|
||
```yaml
|
||
cool-down-minutes:
|
||
description: "Default cooldown period in minutes; 0 disables rate limiting"
|
||
type: number
|
||
default: 120
|
||
```
|
||
|
||
Users listed in [CI_PERMISSIONS.json](https://github.com/sgl-project/sglang/blob/main/.github/CI_PERMISSIONS.json) may have a per-user cooldown interval. In practice, we use the minimum of the workflow’s default window and the user-specific interval.
|
||
|
||
## Code style guidance
|
||
- Avoid code duplication. If the same code snippet (more than five lines) appears multiple times, extract it into a shared function.
|
||
- Minimize device synchronization. Reduce expensive CPU-GPU synchronization operations, such as `tensor.item()` or `tensor.cpu()`, whenever possible. Use vectorized code.
|
||
- Prioritize extreme efficiency. SGLang is a runtime, and most of your code runs on the critical path for every request. Optimize all minor overheads as much as possible, especially in the model forward code.
|
||
- A common pattern is some runtime checks in the model forward pass (e.g., [this](https://github.com/sgl-project/sglang/blob/f1b0eda55c2c4838e8ab90a0fac7fb1e3d7064ab/python/sglang/srt/models/deepseek_v2.py#L486-L491)). These are very likely the same for every layer. Please cache the result as a single boolean value whenever possible.
|
||
- Make functions as pure as possible. Avoid in-place modification of arguments.
|
||
- Keep files concise. If a file exceeds 2,000 lines of code, split it into multiple smaller files. (e.g., `scheduler.py`, `scheduler_output_processor_mixin.py`)
|
||
- Keep tests run fast.
|
||
- If a single test file run longer than 500 seconds, split it into multiple smaller files (e.g., `test_eagle_infer_a.py`, `test_eagle_infer_b.py`).
|
||
- If a single job in a github workflow runs longer than 30 mins, split it into smaller jobs/steps.
|
||
- Reuse server launches in your unit tests to make tests run faster.
|
||
- Never use `pickle.loads()`, `pickle.load()`, or `recv_pyobj()` to deserialize untrusted or network-received data. Python's [pickle module is not secure](https://docs.python.org/3/library/pickle.html) — it can execute arbitrary code during deserialization. Use safe serialization formats such as [msgpack](https://github.com/jcrist/msgspec) or JSON instead.
|
||
- When supporting new hardware or features, follow these guidelines:
|
||
- Do not drastically change existing code.
|
||
- Always prefer new files to introduce specific components for your new hardware (e.g., `allocator_ascend.py`).
|
||
- If you write multiple if/else blocks for new features, ensure the common path (e.g., NVIDIA hardware or the existing code path) is the first branch.
|
||
|
||
## How to update sgl-kernel
|
||
Since sglang and the `sglang-kernel` (prior `sgl-kernel`) distribution are separate Python packages, our current GitHub CI infrastructure does not support updating a kernel and using it immediately within the same pull request (PR).
|
||
To add a new kernel or modify an existing one in the `sgl-kernel/` source tree, you must use multiple PRs.
|
||
|
||
Follow these steps:
|
||
|
||
1. Submit a PR to update the sgl-kernel source code without using it in sglang python package (e.g., [#8884](https://github.com/sgl-project/sglang/pull/8884/files)).
|
||
2. Bump the version of the kernel package (e.g., [#9220](https://github.com/sgl-project/sglang/pull/9220/files)).
|
||
- Once merged, this will trigger an automatic release of the `sglang-kernel` wheel to PyPI.
|
||
- If not urgent, you can wait for other people to release the wheel. A new version will typically be released within one week.
|
||
3. Apply the changes:
|
||
- Update the `sglang-kernel` version in `sglang/python/pyproject.toml` to use the modified kernels.
|
||
- Update the related caller code in the sglang to use the new kernel.
|
||
|
||
## Tips for newcomers
|
||
|
||
If you want to contribute but don’t have a specific idea in mind, pick issues labeled [“good first issue” or “help wanted”](https://github.com/sgl-project/sglang/issues?q=is%3Aissue+label%3A%22good+first+issue%22%2C%22help+wanted%22). These tasks typically have lower complexity and provide an excellent introduction to the codebase.
|
||
|
||
Also check out the following materials as startup guide:
|
||
- [Mini-SGLang](https://github.com/sgl-project/mini-sglang) for a quick overview on the structure of sglang.
|
||
- [Code Walk-through](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/tree/main/sglang/code-walk-through) for a deeper look into SGLang’s workflow.
|
||
- [GTC-2026 Training Lab](https://drive.google.com/file/d/1mwOZEtipNLJzrflCTodj34KhuOZEoEw5/view?usp=drive_link) for hands-on practices of how to do optimization, benchmarking, or profiling on a launched SGLang instance.
|
||
|
||
If you have any questions or want to start a discussion, please feel free to ask in our [Slack channel](https://slack.sglang.io).
|
||
|
||
Thank you for your interest in SGLang. Happy coding!
|