Commit Graph

32 Commits

Author SHA1 Message Date
Mohammad Miadh Angkad
bcc0c65aa8 [DSA] Hopper FP8 FlashMLA KV padding (#22372) 2026-04-12 02:19:17 -07:00
Zhangheng
3d3a32c0b9 [HiSparse]: Add readme docs for HiSparse Feature (#22238) 2026-04-07 00:39:24 -07:00
Mohammad Miadh Angkad
b311db2e49 [Doc] Fix and improve DeepSeek V3.2/GLM-5 documentation (#22179) 2026-04-05 23:26:42 -07:00
Baizhou Zhang
106baedbfb [Doc] Update GLM-5 instructions in sglang documentation (#21716) 2026-04-05 03:13:07 -07:00
Baidu-AIAK
6851613b93 [Bugfix] For cp: Fixed hang problem in prefix cache and kvcache support fp8 in-seq-split mode (#19656)
Co-authored-by: vincent <vincent@vincentdeMacBook-Pro.local>
2026-03-03 19:19:46 -08:00
Rain Jiang
0ffd0a3995 Nsa trtllm mla sparse fp8 support with Deepseek v3.2 NVFP4 (#18389) 2026-02-16 09:29:54 +08:00
dongjiyingdjy
8b4c364960 refactor context parallel state (#17213)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2026-02-13 23:18:17 +08:00
Baizhou Zhang
947927bdb5 [V3.2] Change default CP token split method to --round-robin-split (#18613) 2026-02-11 20:14:35 +08:00
Baizhou Zhang
1d942e4eef [DeepSeek] Update tests and document for DeepSeek V3.2 NVFP4 checkpoint (#17657) 2026-01-27 22:10:57 +08:00
Hubert Lu
df42f4d386 [AMD] Update dsv3.2 AMD GPU docs and unify ROCm TileLang build (#17783)
Co-authored-by: wufann <715544327@qq.com>
2026-01-26 21:10:32 -08:00
ybyang
2122fea3c4 Update deepseekV32 Cp doc (#17054) 2026-01-14 11:19:26 +08:00
ybyang
aab640c99f add doc for dsv32 cp+pp (#16916) 2026-01-12 19:14:07 +08:00
hlu1
aeb480c11f Add top-p to run_eval.py (#16844) 2026-01-10 17:10:37 +08:00
Baizhou Zhang
f07e76b229 Multiple refactors of DeepSeek V32 and context parallel (#16305) 2026-01-03 02:21:22 +08:00
Yongfei Xu
0d244116d2 [DeepSeek v3.2] opt Context Parallelism: support fused moe, multi batch and fp8 kvcache (#13959) 2026-01-02 23:49:14 +08:00
b8zhong
d20699a33c [Deepseek V3.2] Support Overlap Spec + NSA (#15307)
Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>
2025-12-17 13:35:39 -08:00
Ashton Chew
2bdbaef18e [DeepSeekV3.2] Add pure TP+MTP test (#15088)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-12-16 21:48:12 -08:00
almaslof
d0f756aec9 [docs] Fix kernel name (#14887) 2025-12-11 10:48:16 -05:00
George Armstrong
91c9c14c28 DOC update nemo-skills in docs (#14555)
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-12-06 19:03:08 -08:00
Baizhou Zhang
7e78825d5a [Tiny]Small fixes in deepseek v32 doc (#14372)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
2025-12-03 11:35:40 -08:00
Baizhou Zhang
4bcc5879af [Doc] Fix DeepSeek V32 Doc (#14336) 2025-12-02 21:06:55 -08:00
Baizhou Zhang
922054079c [Doc] Update DeepSeek-V3.2 document (#14321) 2025-12-02 18:19:39 -08:00
Lianmin Zheng
bc3d2a85af [Minor] update docs (#14212) 2025-12-01 02:33:58 -08:00
YAMY
decb48965d [DeepSeekV3.2] Enable pure TP & Partial DP Attention (#13646) 2025-11-30 15:59:23 -08:00
hlu1
7291c72e57 [Deepseek V3.2] Change indexer weights_proj to fp32 (#13459) 2025-11-20 12:24:10 -08:00
lixiaolx
d368c7451a (1/n)support context parallel with deepseekv3.2-DSA (#12065) 2025-11-16 20:12:25 -08:00
YAMY
190002c613 [Docs][DeepseekV3.2] Update deepseekv3.2 docs for mha short seq prefill (#12868) 2025-11-08 00:11:02 -08:00
Baizhou Zhang
621dfb8886 Import flash_mla from sgl-kernel (#12135) 2025-10-29 23:54:21 -07:00
hlu1
0ee831dee0 Update deepseek_v32.md (#12296) 2025-10-28 14:52:38 -07:00
Baizhou Zhang
97828878d8 [Doc] Small update of DeepSeek v3.2 document (#12138) 2025-10-25 20:34:05 -07:00
Baizhou Zhang
bcecf27e7c [Doc] Fix format for deepseek v3.2 document (#12130) 2025-10-25 15:07:50 -07:00
Baizhou Zhang
729b242934 [Doc] Add documentation for DeepSeek V3.2 (#11877)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: ybyang <ybyang7@iflytek.com>
2025-10-24 19:06:22 -07:00