Files
sglang/python
xwy-amd8 8d06c338d4 fix(kt): fix Kimi K2.5 RAWINT4 CUDA graph capture crash
Three fixes for Kimi K2.5 RAWINT4 failing to start with CUDA graph:

1. fused_marlin_moe.py: Fix IndentationError from bad merge conflict
   resolution — imports were left outside the `if _is_cuda:` block.

2. fused_marlin_moe.py: Add early return for E=0/M=0. When
   kt-num-gpu-experts=0, GPU expert weights are empty tensors (E=0).
   The marlin MoE kernel crashes on these empty inputs. Return zeros
   so KT CPU experts can contribute the full result.

3. deepseek_v2.py: Skip dual-stream path for KT wrapper. The
   forward_normal_dual_stream uses alt_stream for shared expert
   parallelism, which conflicts with KT wrapper internal _cpu_stream
   during CUDA graph capture.

Fixes #1866
2026-03-01 23:12:05 +08:00
..