MARATRIX
|
069d4c577b
|
Fix Kimi K2.5 PP layer range exposure for PD disaggregation (#19959)
Signed-off-by: yafeng.li <yafeng.li@mthreads.com>
|
2026-03-06 16:14:02 -08:00 |
|
Liangsheng Yin
|
ddcecdea49
|
[Core] Unify max_num_reqs dp_size division for pool sizing (#20063)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-06 16:12:59 -08:00 |
|
Kangyan-Zhou
|
7a12255b6e
|
fix: set first_token_time before computing decode_throughput for single-batch completions (#19984)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-06 16:11:41 -08:00 |
|
Aurick Qiao
|
5c8e28698c
|
Add cleanup for _ATTN_TP in parallel_state.py (#19978)
|
2026-03-06 15:43:31 -08:00 |
|
Shu Wang
|
61de303f0a
|
Fix fallback to default tactic (flashinfer autotuner) with trtllm_fp4_block_scale_moe (#19189)
|
2026-03-06 15:15:04 -08:00 |
|
Kangyan-Zhou
|
e89069ee64
|
Fallback to torch.cuda.mem_get_info() when nvidia-smi is unavailable (#18957)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-06 15:00:08 -08:00 |
|
Liangsheng Yin
|
604db4471d
|
[Core] Clarify memory variable naming in model runner (#20060)
|
2026-03-06 14:00:46 -08:00 |
|
Liangsheng Yin
|
7a6cf0e9ba
|
[Core] Extract _calculate_mamba_ratio and _init_pools from init_memory_pool (#20058)
|
2026-03-06 13:37:22 -08:00 |
|
Mohammad Miadh Angkad
|
759700c808
|
Fix SM120 triton_kernels MXFP4 block_k for GPT-OSS (#20040)
|
2026-03-06 10:53:08 -08:00 |
|
R0CKSTAR
|
de1a0afcbc
|
[MUSA][10/N] Add GGUF support (#18357)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
|
2026-03-06 10:50:35 -08:00 |
|
JohnHerry
|
e8f2b80340
|
[diffusion] improve: improve code readability of DenoisingStage (#20003)
|
2026-03-06 23:23:44 +08:00 |
|
xingsy97
|
54634b9a40
|
[Kernel] Dispatch exp/sin/cos through dtype_trait (#19798)
|
2026-03-06 22:57:52 +08:00 |
|
Johnsonms
|
2d266c73ea
|
Migrate renorm kernels from sgl-kernel to FlashInfer JIT (#18854)
|
2026-03-06 22:53:28 +08:00 |
|
Xiaoyu Zhang
|
6d22c9f369
|
[Diffusion] Move hf kernels diffusion cuda kernels skills to SGLD (#20001)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-06 22:16:06 +08:00 |
|
Yuan Luo
|
f7de9375ac
|
[GDN][Qwen3-Next][Qwen3.5] Fuse fused_gdn_gating and fused_recurrent_gated_delta_rule_update in verify_target (#19775)
|
2026-03-06 21:42:44 +08:00 |
|
Prozac614
|
e3b581ce6b
|
[diffusion] fix: remove num_frames in wan2_1_t2v_1_3b_lora_1gpu test (#20009)
Co-authored-by: daiweitao <dwti614707404@163.com>
|
2026-03-06 21:36:43 +08:00 |
|
Kangyan-Zhou
|
25e678d933
|
[diffusion] endpoint: add /server_info and /model_info endpoints for gateway discovery (#20020)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-06 21:36:13 +08:00 |
|
inkcherry
|
84aaa69795
|
[AMD] Use bfloat16 for correction_bias in AITER FP8 path to avoid runtime dtype conversion for dsv3 (#19843)
|
2026-03-06 00:57:12 -08:00 |
|
Clint
|
27053aa5ed
|
Fix MLA decode path returning unwritten (padded) rows (#19902)
|
2026-03-06 00:54:29 -08:00 |
|
xdtbynd
|
0252ca8255
|
[Bugfix] Fix the bug blocking the startup of Llama-3.2-11b-Vision-Instruct (#19638)
Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
|
2026-03-06 16:21:50 +08:00 |
|
Zheng Wengang
|
da27d9bff6
|
[Bug-Fix][EPD]: skip log waiting-image-req for zmq_to_tokenzer/mooncake (#19555)
|
2026-03-06 14:39:22 +08:00 |
|
Mook
|
be9a9e4819
|
refactor(multimodal/test): centralize model names and shared utilities in test_utils (#19354)
Co-authored-by: Ratish P <114130421+Ratish1@users.noreply.github.com>
|
2026-03-05 20:09:42 -08:00 |
|
Baizhou Zhang
|
51e5dc845a
|
Revert "[Kernel Slimming] Migrate NVFP4 kernels to JIT" (#20005)
|
2026-03-05 19:40:00 -08:00 |
|
sushil Dubey
|
6e5a2de354
|
[diffusion] fix: fix reading multiple prompts from prompt file (#19075)
Signed-off-by: Sushil Dubey <sushil.dubey@intel.com>
|
2026-03-06 11:23:31 +08:00 |
|
Simo Lin
|
9502369488
|
fix(grpc): add server-side keepalive options to prevent GOAWAY (#19986)
Signed-off-by: Simo Lin <linsimo.mark@gmail.com>
|
2026-03-05 18:56:35 -08:00 |
|
liupeng374
|
5471e4a492
|
[NPU][Feature] eliminate dsv3 redundant rotary embed calculation (#19842)
|
2026-03-06 09:02:14 +08:00 |
|
chenxu214
|
b912d7ae19
|
[OPT]Skip the first delayer to maximize the BS of the decoding. (#19836)
|
2026-03-06 08:53:19 +08:00 |
|
shadowxz109
|
261be85ecc
|
Support mrope_position_delta cache
|
2026-03-06 08:50:53 +08:00 |
|
Xinyuan Tong
|
9ebffef1ef
|
[FIX] NSA backend page_table overflow in speculative decoding target_verify (#19016)
|
2026-03-05 16:04:58 -08:00 |
|
Ajay Anubolu
|
13af7cbb02
|
fix: use consistent time denominator for throughput metrics in bench_one_batch_server (#19223)
|
2026-03-05 15:58:17 -08:00 |
|
Chang Su
|
dd2bbe6d62
|
fix(grpc): use context.abort() with proper status codes instead of in-band errors (#19972)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
|
2026-03-05 14:53:18 -08:00 |
|
Qiaolin Yu
|
46dced64ea
|
Adjust padding size to improve triton_kernels moe performance (#19174)
|
2026-03-05 14:50:40 -08:00 |
|
kpham-sgl
|
346a4131cf
|
[Spec] Refactor NaN/OOB checks to async maybe_detect_* with env-var control (#19899)
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
|
2026-03-05 13:51:05 -08:00 |
|
Xinyu Zhang
|
b3cfad0a80
|
Add Ray actor support for scheduler process management (DP=1) (#17684)
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-03-05 13:21:23 -08:00 |
|
sglang-bot
|
ebb66cc1de
|
[misc] Priority scheduling metrics cleanup (#19927)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-05 12:42:42 -08:00 |
|
danielafrimi
|
ff6048fb9c
|
rename nemotron reasoning parser (#19865)
Signed-off-by: dafrimi <dafrimi@nvidia.com>
|
2026-03-05 11:27:07 -08:00 |
|
Mohammad Miadh Angkad
|
41fd53fe37
|
Fix profile_activities parameter name in bench_one_batch_server_internal.py (#19954)
|
2026-03-05 10:34:06 -08:00 |
|
akhilg-nv
|
73d272bddb
|
Revised fix for HybridAttnBackend forward for linear attn (#19369)
|
2026-03-06 00:05:35 +08:00 |
|
Zheng Wengang
|
0de0d74195
|
[EPD][Feat]support adaptive forward (#18118)
|
2026-03-05 21:12:30 +08:00 |
|
StonyPort
|
806d41ab65
|
[quant] fix fp32 downcasting (#19844)
Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
|
2026-03-05 17:54:59 +08:00 |
|
Rain Jiang
|
472eef4071
|
fa4 cleanup (#19727)
|
2026-03-05 17:54:25 +08:00 |
|
Chi McIsaac
|
c36de62bfc
|
[diffusion] fix images/edit with 2 images (#17520)
Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-05 16:56:39 +08:00 |
|
xingsy97
|
dbc896f204
|
[Test] Enhance JIT kvcache store kernel test coverage (#19630)
|
2026-03-05 16:17:15 +08:00 |
|
Tiwei Bie
|
727face6c2
|
[DLLM] Add initial radix cache support (#18724)
|
2026-03-04 23:24:09 -08:00 |
|
Kalyan Kumar
|
c1df359b44
|
Add XPU profiler activity support in benchmark code (#12981)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-04 23:22:56 -08:00 |
|
Mohammad Miadh Angkad
|
2bdd89a6cd
|
[Kernel Slimming] Migrate NVFP4 kernels to JIT (#19437)
|
2026-03-05 15:22:28 +08:00 |
|
Yilong Zhao
|
1bbfed0539
|
[misc] add env for http keep alive timeout (#19847)
|
2026-03-04 22:00:51 -08:00 |
|
Chenxi Li
|
86c5617787
|
[BUG]: fix prevent illegal memory access in Mamba SSM tracking during EAGLE speculative verification (#19415)
Co-authored-by: ConnorLi96 <ConnorLi96@users.noreply.github.com>
|
2026-03-04 21:13:21 -08:00 |
|
Baizhou Zhang
|
10c65df48a
|
[Bug] Fix lora tp bug on H200 (#19769)
|
2026-03-04 20:11:02 -08:00 |
|
Xinyi Song
|
0e6a64712a
|
[bugfix] Fix PPMissingLayer AttributeError when Using PP (#19804)
|
2026-03-04 19:48:15 -08:00 |
|