sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-06-30 03:37:51 +00:00

Files

xwy-amd8 2ba1f0dea6 Skip KT CPU-GPU coordination during CUDA graph capture

During CUDA graph capture (regular or PCG), torch.cuda.synchronize()
and CPU-GPU expert coordination are not allowed. Detect capture mode
via is_in_piecewise_cuda_graph() and torch.cuda.is_current_stream_capturing(),
and delegate directly to the GPU method in those cases.

This enables running Qwen3.5 with --attention-backend triton without
--disable-cuda-graph, improving decode from ~11 tok/s to ~65 tok/s.

2026-02-26 08:34:30 +00:00

sglang

Skip KT CPU-GPU coordination during CUDA graph capture

2026-02-26 08:34:30 +00:00

pyproject_cpu.toml

refactor: replace local proto compilation with smg-grpc-proto package (#18682 )

2026-02-12 05:29:24 -08:00

pyproject_npu.toml

[diffusion] model: LTX-2 Support PR3 (#19151 )

2026-02-24 16:55:28 +08:00

pyproject_other.toml

[diffusion] model: LTX-2 Support PR3 (#19151 )

2026-02-24 16:55:28 +08:00

pyproject_xpu.toml

refactor: replace local proto compilation with smg-grpc-proto package (#18682 )

2026-02-12 05:29:24 -08:00

pyproject.toml

[diffusion] chore: tiny fix pyproject.toml (#19256 )

2026-02-25 11:57:53 +08:00