sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-06-30 03:37:51 +00:00

Files

xwy-amd8 8d06c338d4 fix(kt): fix Kimi K2.5 RAWINT4 CUDA graph capture crash

Three fixes for Kimi K2.5 RAWINT4 failing to start with CUDA graph:

1. fused_marlin_moe.py: Fix IndentationError from bad merge conflict
   resolution — imports were left outside the `if _is_cuda:` block.

2. fused_marlin_moe.py: Add early return for E=0/M=0. When
   kt-num-gpu-experts=0, GPU expert weights are empty tensors (E=0).
   The marlin MoE kernel crashes on these empty inputs. Return zeros
   so KT CPU experts can contribute the full result.

3. deepseek_v2.py: Skip dual-stream path for KT wrapper. The
   forward_normal_dual_stream uses alt_stream for shared expert
   parallelism, which conflicts with KT wrapper internal _cpu_stream
   during CUDA graph capture.

Fixes #1866

2026-03-01 23:12:05 +08:00

sglang

fix(kt): fix Kimi K2.5 RAWINT4 CUDA graph capture crash

2026-03-01 23:12:05 +08:00

pyproject_cpu.toml

refactor: replace local proto compilation with smg-grpc-proto package (#18682 )

2026-02-12 05:29:24 -08:00

pyproject_npu.toml

[diffusion] model: LTX-2 Support PR3 (#19151 )

2026-02-24 16:55:28 +08:00

pyproject_other.toml

[diffusion] model: LTX-2 Support PR3 (#19151 )

2026-02-24 16:55:28 +08:00

pyproject_xpu.toml

refactor: replace local proto compilation with smg-grpc-proto package (#18682 )

2026-02-12 05:29:24 -08:00

pyproject.toml

[diffusion] chore: tiny fix pyproject.toml (#19256 )

2026-02-25 11:57:53 +08:00