242 Commits

Author SHA1 Message Date
Jianwei Dong
15c624dcae Fix/sglang kt detection (#1875)
* [feat]: simplify sglang installation with submodule, auto-sync CI, and version alignment

- Add kvcache-ai/sglang as git submodule at third_party/sglang (branch = main)
- Add top-level install.sh for one-click source installation (sglang + kt-kernel)
- Add sglang-kt as hard dependency in kt-kernel/pyproject.toml
- Add CI workflow to auto-sync sglang submodule daily and create PR
- Add CI workflow to build and publish sglang-kt to PyPI
- Integrate sglang-kt build into release-pypi.yml (version.py bump publishes both packages)
- Align sglang-kt version with ktransformers via SGLANG_KT_VERSION env var injection
- Update Dockerfile to use submodule and inject aligned version
- Update all 13 doc files, CLI hints, and i18n strings to reference new install methods

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [build]: bump version to 0.5.2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [build]: rename PyPI package from kt-kernel to ktransformers

Users can now `pip install ktransformers` to get everything
(sglang-kt is auto-installed as a dependency).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Revert "[build]: rename PyPI package from kt-kernel to ktransformers"

This reverts commit e0cbbf6364.

* [build]: add ktransformers meta-package for PyPI

`pip install ktransformers` now works as a single install command.
It pulls kt-kernel (which in turn pulls sglang-kt).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [fix]: show sglang-kt package version in kt version command

- Prioritize sglang-kt package version (aligned with ktransformers)
  over sglang internal __version__
- Update display name from "sglang" to "sglang-kt"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [fix]: improve sglang-kt detection in kt doctor and kt version

Recognize sglang-kt package name as proof of kvcache-ai fork installation.
Previously both commands fell through to "PyPI (not recommended)" for
non-editable local source installs. Now version.py reuses the centralized
check_sglang_installation() logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [build]: bump version to 0.5.2.post1

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 16:54:48 +08:00
Jianwei Dong
16a8b98f3e support qwen3.5 (#1846) 2026-02-16 15:48:14 +08:00
Jiaqi Liao
a3d5d53605 Update MiniMax-M2.5.md (#1849) 2026-02-13 22:35:36 +08:00
Jianwei Dong
f0e4fc612b support minimax-m2.5 (#1848) 2026-02-13 19:15:44 +08:00
Oql
1c72b3f5bd fix glm5 docs (#1845) 2026-02-12 02:33:37 +08:00
Oql
7f7aeaeff6 support glm 5 (#1844) 2026-02-12 02:03:32 +08:00
Jiaqi Liao
061fb56382 Update Kimi-K2.5.md (#1838) 2026-02-07 16:38:39 +08:00
Oql
4f64665758 [docs]: add Qwen3 Coder Next Tutorial (#1833) 2026-02-04 16:27:10 +08:00
Jiaqi Liao
794c04fae4 Revert "[doc]: update kimi_k2.5 doc (#1823)" (#1825)
This reverts commit 2e6506535b.
2026-01-30 16:10:01 +08:00
Oql
ccbb5b1cf8 [docs]: add clawd bot docs
add clawd bot docs
2026-01-30 15:51:30 +08:00
Jiaqi Liao
2e6506535b [doc]: update kimi_k2.5 doc (#1823) 2026-01-30 15:43:18 +08:00
Peilin Li
8321d00cc5 Add files via upload (#1814) 2026-01-27 17:44:50 +08:00
Jiaqi Liao
2f6f7f1921 Kimi k2.5 doc (#1812)
* [doc]: add Kimi-K2.5 deploy&sft guide

* [doc]: add Kimi-K2.5 deploy&sft guide
2026-01-27 13:33:25 +08:00
Jiaqi Liao
1da075a3fa Revert "[doc]: add Kimi-K2.5 deploy&sft guide (#1810)" (#1811)
This reverts commit a368140d76.
2026-01-27 10:05:13 +08:00
Jiaqi Liao
a368140d76 [doc]: add Kimi-K2.5 deploy&sft guide (#1810) 2026-01-27 10:02:59 +08:00
Oql
bf4c8a690b Add Native Precision Tutorial, update worker strategy and README.md (#1807) 2026-01-23 18:00:13 +08:00
Jianwei Dong
8652346e69 [fix]: doc (#1805) 2026-01-23 11:17:08 +08:00
Jianwei Dong
779bf14556 [doc]: add Experts sched tutorial (#1802)
* Change num gpu experts to gpu expert masks and add eplb statistics

* [feat]: update examples

* [fix]: fix fp8 perchannel

* Delete useless tests

* Delete useless tests

* add experts_sched tutorial

---------

Co-authored-by: ouqingliang <1692110604@qq.com>
2026-01-22 15:40:07 +08:00
Peilin Li
a4de664e62 Add AutoDL Tutorial (#1801) 2026-01-22 14:52:47 +08:00
ErvinXie
d2305538f7 Modify installation steps in Kimi-K2-Thinking-Native.md (#1800)
Updated installation instructions for sglang repository.
2026-01-21 15:46:21 +08:00
ZiWei Yuan
b096b01fbc [docs]: add kt-cli doc and update corresponding website (#1768) 2025-12-29 23:06:22 +08:00
Oql
63796374c1 [docs]: fix and add MiniMax-M2 tutorial images. (#1752) 2025-12-25 20:14:35 +08:00
ZiWei Yuan
3315335fb1 [docs]: update docs to kt-kernel & add amd_blis doc (#1744) 2025-12-24 17:55:15 +08:00
ErvinXie
d8046e1bb4 Kt minimax (#1742)
[feat]: fp8 kernel and kt-cli support
2025-12-24 15:39:44 +08:00
mrhaoxx
e7d277d163 [docs]: refine README for dpo updates (#1740)
* [docs]: refine dpo tutorial

* [docs]: refine README for dpo updates

* Update doc/en/DPO_tutorial.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [docs]: update website doc & refine location

---------

Co-authored-by: ErvinXie <ervinxie@foxmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: ZiWei Yuan <yzwliam@126.com>
2025-12-24 11:20:08 +08:00
mrhaoxx
dee1e211d5 [docs]: refine dpo tutorial (#1739) 2025-12-22 18:44:24 +08:00
Peilin Li
16d5d89f50 [docs]: Update Python version options in DPO tutorial (#1734) 2025-12-20 13:44:35 +08:00
Peilin Li
df998e0f36 [docs]: Add RL-DPO Tutorial (#1733) 2025-12-20 12:49:02 +08:00
Shaoxu Cheng
f25e58ad69 fix: qwen3-npu bugs; update: add readme-for-qwen3-npu (#1717)
* fix: qwen3-npu bugs; update: add readme-for-qwen3-npu

* fix: Correct the README description
2025-12-16 14:27:04 +08:00
RICHARDNAN
18fb8fc897 Npu revise benchmark results and prerequisites (#1716)
* Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

* Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

* Revise Ascend NPU tutorial for Docker deployment

Updated the tutorial for deploying the Ascend NPU, changing sections from 'Conda部署' to '镜像部署' and providing specific commands for Docker container setup and Python environment installation.

* Update DeepseekR1 tutorial for Ascend NPU

* Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

* Update W8A8 weight link in tutorial

* Update doc/zh/DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Refactor Docker command and update package manager

Updated Docker run command to simplify device specifications and corrected package manager command from 'apt' to 'yum'.

* Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

* Revise benchmark results and prerequisites

Updated performance results and hardware specifications.

* Update doc/zh/DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-16 14:26:44 +08:00
RICHARDNAN
6431888928 add deploy in docker image (#1691) 2025-12-11 14:11:27 +08:00
ErvinXie
eefc8cf98d 更新 Kimi-K2-Thinking-Native.md (#1684) 2025-12-08 19:58:20 +08:00
Jiaqi Liao
f20e5d1da5 Revise prefill strategy and performance metrics (#1675)
Updated the prefill strategy descriptions and performance benchmarks in the documentation.
2025-12-06 15:36:04 +08:00
Jiaqi Liao
1d62ac21f7 Update Kimi-K2-Thinking-Native.md (#1673) 2025-12-05 23:08:02 +08:00
Jiaqi Liao
69fa7b1a57 Revise installation steps in Kimi-K2 documentation (#1672)
Updated installation instructions and added steps for cloning the repository.
2025-12-05 23:05:24 +08:00
Jiaqi Liao
721b6c4c94 [docs] Update Native Kimi-K2-Thinking documentation and kt-kernel parameters (#1671) 2025-12-05 22:46:16 +08:00
Jiaqi Liao
47da806cde [doc](kt-kernel): add kimi-k2-thinking (#1670) 2025-12-05 21:53:59 +08:00
ErvinXie
71f683acec Support Native Kimi K2 Thinking (#1663)
* [feat]: fix k2 prefill

* Update Kimi-K2-Thinking.md

* Create Kimi-K2-Thinking-Native.md

* Update Kimi-K2-Thinking.md

* Update Kimi-K2-Thinking.md

* Update Kimi-K2-Thinking-Native.md

* [perf] optimize K2 MoE weight loading with per-expert pointers

- Avoid expensive torch.stack().contiguous() in Python (was ~6.6s)
- Use per-expert pointer arrays (gate_projs) instead of contiguous memory
- C++ worker pool performs parallel memcpy for TP slicing
- Add LOAD_TIME_PROFILE for load_weights timing analysis

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: ouqingliang <1692110604@qq.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-05 21:53:05 +08:00
Jianwei Dong
670c488155 [docs]: Add deepseek-v3.2 run tutorial (#1659) 2025-12-02 20:04:10 +08:00
Peilin Li
e637fedc65 [docs]: Add Full introduction of KT (#1636) 2025-11-29 15:46:55 +08:00
RICHARDNAN
2cffdf7033 [docs]: Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md (#1638) 2025-11-24 11:51:07 +08:00
Jiaqi Liao
46af8fcab5 [doc] fix kt parameters (#1629) 2025-11-19 16:41:57 +08:00
Peilin Li
171578a7ec [refactor]: Change named 'KT-SFT' to 'kt-sft' (#1626)
* Change named 'KT-SFT' to 'kt-sft'

* [docs]: update kt-sft name

---------

Co-authored-by: ZiWei Yuan <yzwliam@126.com>
2025-11-17 11:48:42 +08:00
ZiWei Yuan
ab8ad0a110 [docs]: update web doc (#1625) 2025-11-16 14:40:22 +08:00
ZiWei Yuan
be6db6f46b [docs]: improve structure for kt-kernel (#1624)
* [docs]: improve structure for kt-kernel

* Update doc/en/kt-kernel/README.md
2025-11-16 13:21:41 +08:00
ZiWei Yuan
133eea037c [docs]: improve docs structure (#1623) 2025-11-16 12:40:59 +08:00
ZiWei Yuan
c2d2edbeef [docs]: update the web docs structure (#1622) 2025-11-16 12:09:44 +08:00
ZiWei Yuan
c32fefb1cd [doc]: update web doc and kt-kernel doc (#1609)
* [doc]: update web doc and kt-kernel doc

* [doc](book.toml): add book.toml for rust book compile
2025-11-13 20:44:13 +08:00
Peilin Li
148a030026 upload hands-on tutorial with KTransformers-FT, especially in customize your KT-FT+LLaMA-Factory (#1597)
* Add files via upload

* upload hands-on tutorial for KTransformers-FT
2025-11-11 20:54:41 +08:00
Jiaqi Liao
57d14d22bc Refactor: restructure repository to focus on kt-kernel and KT-SFT modulesq recon (#1581)
* refactor: move legacy code to archive/ directory

  - Moved ktransformers, csrc, third_party, merge_tensors to archive/
  - Moved build scripts and configurations to archive/
  - Kept kt-kernel, KT-SFT, doc, and README files in root
  - Preserved complete git history for all moved files

* refactor: restructure repository to focus on kt-kernel and KT-SFT modules

* fix README

* fix README

* fix README

* fix README

* docs: add performance benchmarks to kt-kernel section

Add comprehensive performance data for kt-kernel to match KT-SFT's presentation:
- AMX kernel optimization: 21.3 TFLOPS (3.9× faster than PyTorch)
- Prefill phase: up to 20× speedup vs baseline
- Decode phase: up to 4× speedup
- NUMA optimization: up to 63% throughput improvement
- Multi-GPU (8×L20): 227.85 tokens/s total throughput with DeepSeek-R1 FP8

Source: https://lmsys.org/blog/2025-10-22-KTransformers/

This provides users with concrete performance metrics for both core modules,
making it easier to understand the capabilities of each component.

* refactor: improve kt-kernel performance data with specific hardware and models

Replace generic performance descriptions with concrete benchmarks:
- Specify exact hardware: 8×L20 GPU + Xeon Gold 6454S, Single/Dual-socket Xeon + AMX
- Include specific models: DeepSeek-R1-0528 (FP8), DeepSeek-V3 (671B)
- Show detailed metrics: total throughput, output throughput, concurrency details
- Match KT-SFT presentation style for consistency

This provides users with actionable performance data they can use to evaluate
hardware requirements and expected performance for their use cases.

* fix README

* docs: clean up performance table and improve formatting

* add pic for README

* refactor: simplify .gitmodules and backup legacy submodules

- Remove 7 legacy submodules from root .gitmodules (archive/third_party/*)
- Keep only 2 active submodules for kt-kernel (llama.cpp, pybind11)
- Backup complete .gitmodules to archive/.gitmodules
- Add documentation in archive/README.md for researchers who need legacy submodules

This reduces initial clone size by ~500MB and avoids downloading unused dependencies.

* refactor: move doc/ back to root directory

Keep documentation in root for easier access and maintenance.

* refactor: consolidate all images to doc/assets/

- Move kt-kernel/assets/heterogeneous_computing.png to doc/assets/
- Remove KT-SFT/assets/ (images already in doc/assets/)
- Update KT-SFT/README.md image references to ../doc/assets/
- Eliminates ~7.9MB image duplication
- Centralizes all documentation assets in one location

* fix pic path for README
2025-11-10 17:42:26 +08:00