Files
sglang/python
xwy-amd8 2ba1f0dea6 Skip KT CPU-GPU coordination during CUDA graph capture
During CUDA graph capture (regular or PCG), torch.cuda.synchronize()
and CPU-GPU expert coordination are not allowed. Detect capture mode
via is_in_piecewise_cuda_graph() and torch.cuda.is_current_stream_capturing(),
and delegate directly to the GPU method in those cases.

This enables running Qwen3.5 with --attention-backend triton without
--disable-cuda-graph, improving decode from ~11 tok/s to ~65 tok/s.
2026-02-26 08:34:30 +00:00
..