Commit Graph

225 Commits

Author SHA1 Message Date
Jianwei Dong
779bf14556 [doc]: add Experts sched tutorial (#1802)
* Change num gpu experts to gpu expert masks and add eplb statistics

* [feat]: update examples

* [fix]: fix fp8 perchannel

* Delete useless tests

* Delete useless tests

* add experts_sched tutorial

---------

Co-authored-by: ouqingliang <1692110604@qq.com>
2026-01-22 15:40:07 +08:00
Peilin Li
a4de664e62 Add AutoDL Tutorial (#1801) 2026-01-22 14:52:47 +08:00
ErvinXie
d2305538f7 Modify installation steps in Kimi-K2-Thinking-Native.md (#1800)
Updated installation instructions for sglang repository.
2026-01-21 15:46:21 +08:00
ZiWei Yuan
b096b01fbc [docs]: add kt-cli doc and update corresponding website (#1768) 2025-12-29 23:06:22 +08:00
Oql
63796374c1 [docs]: fix and add MiniMax-M2 tutorial images. (#1752) 2025-12-25 20:14:35 +08:00
ZiWei Yuan
3315335fb1 [docs]: update docs to kt-kernel & add amd_blis doc (#1744) 2025-12-24 17:55:15 +08:00
ErvinXie
d8046e1bb4 Kt minimax (#1742)
[feat]: fp8 kernel and kt-cli support
2025-12-24 15:39:44 +08:00
mrhaoxx
e7d277d163 [docs]: refine README for dpo updates (#1740)
* [docs]: refine dpo tutorial

* [docs]: refine README for dpo updates

* Update doc/en/DPO_tutorial.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [docs]: update website doc & refine location

---------

Co-authored-by: ErvinXie <ervinxie@foxmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: ZiWei Yuan <yzwliam@126.com>
2025-12-24 11:20:08 +08:00
mrhaoxx
dee1e211d5 [docs]: refine dpo tutorial (#1739) 2025-12-22 18:44:24 +08:00
Peilin Li
16d5d89f50 [docs]: Update Python version options in DPO tutorial (#1734) 2025-12-20 13:44:35 +08:00
Peilin Li
df998e0f36 [docs]: Add RL-DPO Tutorial (#1733) 2025-12-20 12:49:02 +08:00
Shaoxu Cheng
f25e58ad69 fix: qwen3-npu bugs; update: add readme-for-qwen3-npu (#1717)
* fix: qwen3-npu bugs; update: add readme-for-qwen3-npu

* fix: Correct the README description
2025-12-16 14:27:04 +08:00
RICHARDNAN
18fb8fc897 Npu revise benchmark results and prerequisites (#1716)
* Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

* Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

* Revise Ascend NPU tutorial for Docker deployment

Updated the tutorial for deploying the Ascend NPU, changing sections from 'Conda部署' to '镜像部署' and providing specific commands for Docker container setup and Python environment installation.

* Update DeepseekR1 tutorial for Ascend NPU

* Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

* Update W8A8 weight link in tutorial

* Update doc/zh/DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Refactor Docker command and update package manager

Updated Docker run command to simplify device specifications and corrected package manager command from 'apt' to 'yum'.

* Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

* Revise benchmark results and prerequisites

Updated performance results and hardware specifications.

* Update doc/zh/DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-16 14:26:44 +08:00
RICHARDNAN
6431888928 add deploy in docker image (#1691) 2025-12-11 14:11:27 +08:00
ErvinXie
eefc8cf98d 更新 Kimi-K2-Thinking-Native.md (#1684) 2025-12-08 19:58:20 +08:00
Jiaqi Liao
f20e5d1da5 Revise prefill strategy and performance metrics (#1675)
Updated the prefill strategy descriptions and performance benchmarks in the documentation.
2025-12-06 15:36:04 +08:00
Jiaqi Liao
1d62ac21f7 Update Kimi-K2-Thinking-Native.md (#1673) 2025-12-05 23:08:02 +08:00
Jiaqi Liao
69fa7b1a57 Revise installation steps in Kimi-K2 documentation (#1672)
Updated installation instructions and added steps for cloning the repository.
2025-12-05 23:05:24 +08:00
Jiaqi Liao
721b6c4c94 [docs] Update Native Kimi-K2-Thinking documentation and kt-kernel parameters (#1671) 2025-12-05 22:46:16 +08:00
Jiaqi Liao
47da806cde [doc](kt-kernel): add kimi-k2-thinking (#1670) 2025-12-05 21:53:59 +08:00
ErvinXie
71f683acec Support Native Kimi K2 Thinking (#1663)
* [feat]: fix k2 prefill

* Update Kimi-K2-Thinking.md

* Create Kimi-K2-Thinking-Native.md

* Update Kimi-K2-Thinking.md

* Update Kimi-K2-Thinking.md

* Update Kimi-K2-Thinking-Native.md

* [perf] optimize K2 MoE weight loading with per-expert pointers

- Avoid expensive torch.stack().contiguous() in Python (was ~6.6s)
- Use per-expert pointer arrays (gate_projs) instead of contiguous memory
- C++ worker pool performs parallel memcpy for TP slicing
- Add LOAD_TIME_PROFILE for load_weights timing analysis

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: ouqingliang <1692110604@qq.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-05 21:53:05 +08:00
Jianwei Dong
670c488155 [docs]: Add deepseek-v3.2 run tutorial (#1659) 2025-12-02 20:04:10 +08:00
Peilin Li
e637fedc65 [docs]: Add Full introduction of KT (#1636) 2025-11-29 15:46:55 +08:00
RICHARDNAN
2cffdf7033 [docs]: Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md (#1638) 2025-11-24 11:51:07 +08:00
Jiaqi Liao
46af8fcab5 [doc] fix kt parameters (#1629) 2025-11-19 16:41:57 +08:00
Peilin Li
171578a7ec [refactor]: Change named 'KT-SFT' to 'kt-sft' (#1626)
* Change named 'KT-SFT' to 'kt-sft'

* [docs]: update kt-sft name

---------

Co-authored-by: ZiWei Yuan <yzwliam@126.com>
2025-11-17 11:48:42 +08:00
ZiWei Yuan
ab8ad0a110 [docs]: update web doc (#1625) 2025-11-16 14:40:22 +08:00
ZiWei Yuan
be6db6f46b [docs]: improve structure for kt-kernel (#1624)
* [docs]: improve structure for kt-kernel

* Update doc/en/kt-kernel/README.md
2025-11-16 13:21:41 +08:00
ZiWei Yuan
133eea037c [docs]: improve docs structure (#1623) 2025-11-16 12:40:59 +08:00
ZiWei Yuan
c2d2edbeef [docs]: update the web docs structure (#1622) 2025-11-16 12:09:44 +08:00
ZiWei Yuan
c32fefb1cd [doc]: update web doc and kt-kernel doc (#1609)
* [doc]: update web doc and kt-kernel doc

* [doc](book.toml): add book.toml for rust book compile
2025-11-13 20:44:13 +08:00
Peilin Li
148a030026 upload hands-on tutorial with KTransformers-FT, especially in customize your KT-FT+LLaMA-Factory (#1597)
* Add files via upload

* upload hands-on tutorial for KTransformers-FT
2025-11-11 20:54:41 +08:00
Jiaqi Liao
57d14d22bc Refactor: restructure repository to focus on kt-kernel and KT-SFT modulesq recon (#1581)
* refactor: move legacy code to archive/ directory

  - Moved ktransformers, csrc, third_party, merge_tensors to archive/
  - Moved build scripts and configurations to archive/
  - Kept kt-kernel, KT-SFT, doc, and README files in root
  - Preserved complete git history for all moved files

* refactor: restructure repository to focus on kt-kernel and KT-SFT modules

* fix README

* fix README

* fix README

* fix README

* docs: add performance benchmarks to kt-kernel section

Add comprehensive performance data for kt-kernel to match KT-SFT's presentation:
- AMX kernel optimization: 21.3 TFLOPS (3.9× faster than PyTorch)
- Prefill phase: up to 20× speedup vs baseline
- Decode phase: up to 4× speedup
- NUMA optimization: up to 63% throughput improvement
- Multi-GPU (8×L20): 227.85 tokens/s total throughput with DeepSeek-R1 FP8

Source: https://lmsys.org/blog/2025-10-22-KTransformers/

This provides users with concrete performance metrics for both core modules,
making it easier to understand the capabilities of each component.

* refactor: improve kt-kernel performance data with specific hardware and models

Replace generic performance descriptions with concrete benchmarks:
- Specify exact hardware: 8×L20 GPU + Xeon Gold 6454S, Single/Dual-socket Xeon + AMX
- Include specific models: DeepSeek-R1-0528 (FP8), DeepSeek-V3 (671B)
- Show detailed metrics: total throughput, output throughput, concurrency details
- Match KT-SFT presentation style for consistency

This provides users with actionable performance data they can use to evaluate
hardware requirements and expected performance for their use cases.

* fix README

* docs: clean up performance table and improve formatting

* add pic for README

* refactor: simplify .gitmodules and backup legacy submodules

- Remove 7 legacy submodules from root .gitmodules (archive/third_party/*)
- Keep only 2 active submodules for kt-kernel (llama.cpp, pybind11)
- Backup complete .gitmodules to archive/.gitmodules
- Add documentation in archive/README.md for researchers who need legacy submodules

This reduces initial clone size by ~500MB and avoids downloading unused dependencies.

* refactor: move doc/ back to root directory

Keep documentation in root for easier access and maintenance.

* refactor: consolidate all images to doc/assets/

- Move kt-kernel/assets/heterogeneous_computing.png to doc/assets/
- Remove KT-SFT/assets/ (images already in doc/assets/)
- Update KT-SFT/README.md image references to ../doc/assets/
- Eliminates ~7.9MB image duplication
- Centralizes all documentation assets in one location

* fix pic path for README
2025-11-10 17:42:26 +08:00
Wenzhang-Chen
62b7b28a16 fix typo (#1452) 2025-11-10 16:08:04 +08:00
Atream
b67cc4095d Change attention backend to 'flashinfer' in launch command
Updated the launch command to include 'flashinfer' as the attention backend.
2025-11-08 20:56:09 +08:00
Peilin Li
f4fe137023 Merge pull request #1572 from JimmyPeilinLi/main
fix: remove py310 as guide
2025-11-08 16:57:10 +08:00
JimmyPeilinLi
1c08a4f0fb fix: remove py310 as guide 2025-11-08 08:54:32 +00:00
Atream
0651dbda04 Simplify launch command by removing unused option
Removed the unused '--attention-backend triton' option from the launch command.
2025-11-08 16:54:18 +08:00
Atream
d6ee384fe2 Fix download link for Kimi-K2-Thinking weights
Updated the download link for AMX INT4 quantized weights.
2025-11-06 19:07:15 +08:00
Atream
d419024bb4 Add KTransformers SGLang inference documentation
Add documentation for KTransformers SGLang inference deployment, including installation steps, model download links, server launch instructions, and performance benchmarks.
2025-11-06 17:53:58 +08:00
Peilin Li
803e645bc1 Update SFT Installation Guide for KimiK2
Added installation instructions and usage examples for KimiK2.
2025-11-06 17:34:21 +08:00
Peilin Li
d7ec838d5a installation guide for KT+SFT(LoRA) in KimiK2 model 2025-11-06 17:27:42 +08:00
ZiWei Yuan
8192cc4166 Merge pull request #1551 from kvcache-ai/JimmyPeilinLi-patch-1
Revise GPU/CPU memory footprint information
2025-11-05 12:23:28 +08:00
ZiWei Yuan
95814c72b2 Merge pull request #1550 from kvcache-ai/lpl-dev-1
Update installation instructions
2025-11-05 12:22:59 +08:00
Peilin Li
6721f8765d Revise GPU/CPU memory footprint information
Updated memory footprint details for DeepSeek models.
2025-11-05 12:11:19 +08:00
Peilin Li
4f9940700e Update installation instructions 2025-11-04 23:06:05 +08:00
Peilin Li
fe556bba34 Update installation instructions 2025-11-04 23:03:36 +08:00
KMSorSMS
0c15da437f [feat](cmake & doc): fix bug with cmake arch detect & update doc for sft 2025-11-04 08:46:26 +00:00
JimmyPeilinLi
7b6ccc3f57 add the docs and update README for KSFT 2025-11-04 05:51:48 +00:00
RICHARDNAN
6085dea039 Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md 2025-10-30 10:05:54 +08:00