mirror of
https://github.com/kvcache-ai/sglang.git
synced 2026-06-30 03:37:51 +00:00
Three fixes for Kimi K2.5 RAWINT4 failing to start with CUDA graph: 1. fused_marlin_moe.py: Fix IndentationError from bad merge conflict resolution — imports were left outside the `if _is_cuda:` block. 2. fused_marlin_moe.py: Add early return for E=0/M=0. When kt-num-gpu-experts=0, GPU expert weights are empty tensors (E=0). The marlin MoE kernel crashes on these empty inputs. Return zeros so KT CPU experts can contribute the full result. 3. deepseek_v2.py: Skip dual-stream path for KT wrapper. The forward_normal_dual_stream uses alt_stream for shared expert parallelism, which conflicts with KT wrapper internal _cpu_stream during CUDA graph capture. Fixes #1866