mirror of
https://github.com/kvcache-ai/sglang.git
synced 2026-06-29 19:28:51 +00:00
* fix(v4-flash): remove broken MXFP4 weight cache + fix rsf double-apply move routed_scaling_factor application from inside apply_v4_triton_kernels_moe to the caller (mxfp4_deepseek.apply), mirroring the trtllm path convention. This fixes a latent double-apply when SGLANG_OPT_MXFP4_FUSE_RSF_SHARED_ADD is enabled. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(scheduler): revert PR #38 req_pool changes that break TP-only mode PR #38 introduced changes that together cause scheduler hang on TP-only configurations with max_running_requests=1: 1. scheduler.py: Removed `if self.pp_size > 1:` guard in get_num_allocatable_reqs, causing TP-only mode to check available_size() unconditionally. 2. memory_pool.py: Changed free_slots from `range(size)` to `range(1, size)` to reserve index 0. With max_running_requests=1, this produces empty free_slots list. 3. scheduler_runtime_checker_mixin.py: Changed expected_free from `req_total_size` to `req_total_size - 1` to match the reserved slot. This fix reverts all 4 locations to v0.6.1.post1 behavior. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(cuda_graph): use out-of-band _replay_forward_batch for non-DSV4 backends Cherry-pick fix from upstream 3ffc34dbe to resolve TypeError when non-DSV4 backends (TritonAttnBackend, etc.) receive unexpected out_cache_loc kwarg during CUDA graph replay. Instead of passing out_cache_loc as a parameter (which requires all backends to update their signatures), use an out-of-band attribute: - Set attn_backend._replay_forward_batch before the call - DSV4 backend reads out_cache_loc from this attribute - Clear the attribute after the call Conflict resolution: kept kt-sglang's attribute path `self.model_runner.attn_backend` (vs upstream's `self.attn_backend`). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: remove undefined _GraphBucket reference in cuda graph replay --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>