Jianwei Dong
779bf14556
[doc]: add Experts sched tutorial ( #1802 )
...
* Change num gpu experts to gpu expert masks and add eplb statistics
* [feat]: update examples
* [fix]: fix fp8 perchannel
* Delete useless tests
* Delete useless tests
* add experts_sched tutorial
---------
Co-authored-by: ouqingliang <1692110604@qq.com >
2026-01-22 15:40:07 +08:00
Peilin Li
a4de664e62
Add AutoDL Tutorial ( #1801 )
2026-01-22 14:52:47 +08:00
ErvinXie
d2305538f7
Modify installation steps in Kimi-K2-Thinking-Native.md ( #1800 )
...
Updated installation instructions for sglang repository.
2026-01-21 15:46:21 +08:00
ZiWei Yuan
b096b01fbc
[docs]: add kt-cli doc and update corresponding website ( #1768 )
2025-12-29 23:06:22 +08:00
Oql
63796374c1
[docs]: fix and add MiniMax-M2 tutorial images. ( #1752 )
2025-12-25 20:14:35 +08:00
ZiWei Yuan
3315335fb1
[docs]: update docs to kt-kernel & add amd_blis doc ( #1744 )
2025-12-24 17:55:15 +08:00
ErvinXie
d8046e1bb4
Kt minimax ( #1742 )
...
[feat]: fp8 kernel and kt-cli support
2025-12-24 15:39:44 +08:00
mrhaoxx
e7d277d163
[docs]: refine README for dpo updates ( #1740 )
...
* [docs]: refine dpo tutorial
* [docs]: refine README for dpo updates
* Update doc/en/DPO_tutorial.md
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* [docs]: update website doc & refine location
---------
Co-authored-by: ErvinXie <ervinxie@foxmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: ZiWei Yuan <yzwliam@126.com >
2025-12-24 11:20:08 +08:00
mrhaoxx
dee1e211d5
[docs]: refine dpo tutorial ( #1739 )
2025-12-22 18:44:24 +08:00
Peilin Li
16d5d89f50
[docs]: Update Python version options in DPO tutorial ( #1734 )
2025-12-20 13:44:35 +08:00
Peilin Li
df998e0f36
[docs]: Add RL-DPO Tutorial ( #1733 )
2025-12-20 12:49:02 +08:00
Shaoxu Cheng
f25e58ad69
fix: qwen3-npu bugs; update: add readme-for-qwen3-npu ( #1717 )
...
* fix: qwen3-npu bugs; update: add readme-for-qwen3-npu
* fix: Correct the README description
2025-12-16 14:27:04 +08:00
RICHARDNAN
18fb8fc897
Npu revise benchmark results and prerequisites ( #1716 )
...
* Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md
* Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md
* Revise Ascend NPU tutorial for Docker deployment
Updated the tutorial for deploying the Ascend NPU, changing sections from 'Conda部署' to '镜像部署' and providing specific commands for Docker container setup and Python environment installation.
* Update DeepseekR1 tutorial for Ascend NPU
* Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md
* Update W8A8 weight link in tutorial
* Update doc/zh/DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* Refactor Docker command and update package manager
Updated Docker run command to simplify device specifications and corrected package manager command from 'apt' to 'yum'.
* Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md
* Revise benchmark results and prerequisites
Updated performance results and hardware specifications.
* Update doc/zh/DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-16 14:26:44 +08:00
RICHARDNAN
6431888928
add deploy in docker image ( #1691 )
2025-12-11 14:11:27 +08:00
ErvinXie
eefc8cf98d
更新 Kimi-K2-Thinking-Native.md ( #1684 )
2025-12-08 19:58:20 +08:00
Jiaqi Liao
f20e5d1da5
Revise prefill strategy and performance metrics ( #1675 )
...
Updated the prefill strategy descriptions and performance benchmarks in the documentation.
2025-12-06 15:36:04 +08:00
Jiaqi Liao
1d62ac21f7
Update Kimi-K2-Thinking-Native.md ( #1673 )
2025-12-05 23:08:02 +08:00
Jiaqi Liao
69fa7b1a57
Revise installation steps in Kimi-K2 documentation ( #1672 )
...
Updated installation instructions and added steps for cloning the repository.
2025-12-05 23:05:24 +08:00
Jiaqi Liao
721b6c4c94
[docs] Update Native Kimi-K2-Thinking documentation and kt-kernel parameters ( #1671 )
2025-12-05 22:46:16 +08:00
Jiaqi Liao
47da806cde
[doc](kt-kernel): add kimi-k2-thinking ( #1670 )
2025-12-05 21:53:59 +08:00
ErvinXie
71f683acec
Support Native Kimi K2 Thinking ( #1663 )
...
* [feat]: fix k2 prefill
* Update Kimi-K2-Thinking.md
* Create Kimi-K2-Thinking-Native.md
* Update Kimi-K2-Thinking.md
* Update Kimi-K2-Thinking.md
* Update Kimi-K2-Thinking-Native.md
* [perf] optimize K2 MoE weight loading with per-expert pointers
- Avoid expensive torch.stack().contiguous() in Python (was ~6.6s)
- Use per-expert pointer arrays (gate_projs) instead of contiguous memory
- C++ worker pool performs parallel memcpy for TP slicing
- Add LOAD_TIME_PROFILE for load_weights timing analysis
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
---------
Co-authored-by: ouqingliang <1692110604@qq.com >
Co-authored-by: Claude <noreply@anthropic.com >
2025-12-05 21:53:05 +08:00
Jianwei Dong
670c488155
[docs]: Add deepseek-v3.2 run tutorial ( #1659 )
2025-12-02 20:04:10 +08:00
Peilin Li
e637fedc65
[docs]: Add Full introduction of KT ( #1636 )
2025-11-29 15:46:55 +08:00
RICHARDNAN
2cffdf7033
[docs]: Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md ( #1638 )
2025-11-24 11:51:07 +08:00
Jiaqi Liao
46af8fcab5
[doc] fix kt parameters ( #1629 )
2025-11-19 16:41:57 +08:00
Peilin Li
171578a7ec
[refactor]: Change named 'KT-SFT' to 'kt-sft' ( #1626 )
...
* Change named 'KT-SFT' to 'kt-sft'
* [docs]: update kt-sft name
---------
Co-authored-by: ZiWei Yuan <yzwliam@126.com >
2025-11-17 11:48:42 +08:00
ZiWei Yuan
ab8ad0a110
[docs]: update web doc ( #1625 )
2025-11-16 14:40:22 +08:00
ZiWei Yuan
be6db6f46b
[docs]: improve structure for kt-kernel ( #1624 )
...
* [docs]: improve structure for kt-kernel
* Update doc/en/kt-kernel/README.md
2025-11-16 13:21:41 +08:00
ZiWei Yuan
133eea037c
[docs]: improve docs structure ( #1623 )
2025-11-16 12:40:59 +08:00
ZiWei Yuan
c2d2edbeef
[docs]: update the web docs structure ( #1622 )
2025-11-16 12:09:44 +08:00
ZiWei Yuan
c32fefb1cd
[doc]: update web doc and kt-kernel doc ( #1609 )
...
* [doc]: update web doc and kt-kernel doc
* [doc](book.toml): add book.toml for rust book compile
2025-11-13 20:44:13 +08:00
Peilin Li
148a030026
upload hands-on tutorial with KTransformers-FT, especially in customize your KT-FT+LLaMA-Factory ( #1597 )
...
* Add files via upload
* upload hands-on tutorial for KTransformers-FT
2025-11-11 20:54:41 +08:00
Jiaqi Liao
57d14d22bc
Refactor: restructure repository to focus on kt-kernel and KT-SFT modulesq recon ( #1581 )
...
* refactor: move legacy code to archive/ directory
- Moved ktransformers, csrc, third_party, merge_tensors to archive/
- Moved build scripts and configurations to archive/
- Kept kt-kernel, KT-SFT, doc, and README files in root
- Preserved complete git history for all moved files
* refactor: restructure repository to focus on kt-kernel and KT-SFT modules
* fix README
* fix README
* fix README
* fix README
* docs: add performance benchmarks to kt-kernel section
Add comprehensive performance data for kt-kernel to match KT-SFT's presentation:
- AMX kernel optimization: 21.3 TFLOPS (3.9× faster than PyTorch)
- Prefill phase: up to 20× speedup vs baseline
- Decode phase: up to 4× speedup
- NUMA optimization: up to 63% throughput improvement
- Multi-GPU (8×L20): 227.85 tokens/s total throughput with DeepSeek-R1 FP8
Source: https://lmsys.org/blog/2025-10-22-KTransformers/
This provides users with concrete performance metrics for both core modules,
making it easier to understand the capabilities of each component.
* refactor: improve kt-kernel performance data with specific hardware and models
Replace generic performance descriptions with concrete benchmarks:
- Specify exact hardware: 8×L20 GPU + Xeon Gold 6454S, Single/Dual-socket Xeon + AMX
- Include specific models: DeepSeek-R1-0528 (FP8), DeepSeek-V3 (671B)
- Show detailed metrics: total throughput, output throughput, concurrency details
- Match KT-SFT presentation style for consistency
This provides users with actionable performance data they can use to evaluate
hardware requirements and expected performance for their use cases.
* fix README
* docs: clean up performance table and improve formatting
* add pic for README
* refactor: simplify .gitmodules and backup legacy submodules
- Remove 7 legacy submodules from root .gitmodules (archive/third_party/*)
- Keep only 2 active submodules for kt-kernel (llama.cpp, pybind11)
- Backup complete .gitmodules to archive/.gitmodules
- Add documentation in archive/README.md for researchers who need legacy submodules
This reduces initial clone size by ~500MB and avoids downloading unused dependencies.
* refactor: move doc/ back to root directory
Keep documentation in root for easier access and maintenance.
* refactor: consolidate all images to doc/assets/
- Move kt-kernel/assets/heterogeneous_computing.png to doc/assets/
- Remove KT-SFT/assets/ (images already in doc/assets/)
- Update KT-SFT/README.md image references to ../doc/assets/
- Eliminates ~7.9MB image duplication
- Centralizes all documentation assets in one location
* fix pic path for README
2025-11-10 17:42:26 +08:00
Wenzhang-Chen
62b7b28a16
fix typo ( #1452 )
2025-11-10 16:08:04 +08:00
Atream
b67cc4095d
Change attention backend to 'flashinfer' in launch command
...
Updated the launch command to include 'flashinfer' as the attention backend.
2025-11-08 20:56:09 +08:00
Peilin Li
f4fe137023
Merge pull request #1572 from JimmyPeilinLi/main
...
fix: remove py310 as guide
2025-11-08 16:57:10 +08:00
JimmyPeilinLi
1c08a4f0fb
fix: remove py310 as guide
2025-11-08 08:54:32 +00:00
Atream
0651dbda04
Simplify launch command by removing unused option
...
Removed the unused '--attention-backend triton' option from the launch command.
2025-11-08 16:54:18 +08:00
Atream
d6ee384fe2
Fix download link for Kimi-K2-Thinking weights
...
Updated the download link for AMX INT4 quantized weights.
2025-11-06 19:07:15 +08:00
Atream
d419024bb4
Add KTransformers SGLang inference documentation
...
Add documentation for KTransformers SGLang inference deployment, including installation steps, model download links, server launch instructions, and performance benchmarks.
2025-11-06 17:53:58 +08:00
Peilin Li
803e645bc1
Update SFT Installation Guide for KimiK2
...
Added installation instructions and usage examples for KimiK2.
2025-11-06 17:34:21 +08:00
Peilin Li
d7ec838d5a
installation guide for KT+SFT(LoRA) in KimiK2 model
2025-11-06 17:27:42 +08:00
ZiWei Yuan
8192cc4166
Merge pull request #1551 from kvcache-ai/JimmyPeilinLi-patch-1
...
Revise GPU/CPU memory footprint information
2025-11-05 12:23:28 +08:00
ZiWei Yuan
95814c72b2
Merge pull request #1550 from kvcache-ai/lpl-dev-1
...
Update installation instructions
2025-11-05 12:22:59 +08:00
Peilin Li
6721f8765d
Revise GPU/CPU memory footprint information
...
Updated memory footprint details for DeepSeek models.
2025-11-05 12:11:19 +08:00
Peilin Li
4f9940700e
Update installation instructions
2025-11-04 23:06:05 +08:00
Peilin Li
fe556bba34
Update installation instructions
2025-11-04 23:03:36 +08:00
KMSorSMS
0c15da437f
[feat](cmake & doc): fix bug with cmake arch detect & update doc for sft
2025-11-04 08:46:26 +00:00
JimmyPeilinLi
7b6ccc3f57
add the docs and update README for KSFT
2025-11-04 05:51:48 +00:00
RICHARDNAN
6085dea039
Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md
2025-10-30 10:05:54 +08:00