Commit Graph

1192 Commits

Author SHA1 Message Date
SCDESPERTATE
b0f827d2a9 [chore](cuda): explicitly use ele_per_blk var for better readability (#1784) 2026-01-23 11:11:08 +08:00
Jianwei Dong
779bf14556 [doc]: add Experts sched tutorial (#1802)
* Change num gpu experts to gpu expert masks and add eplb statistics

* [feat]: update examples

* [fix]: fix fp8 perchannel

* Delete useless tests

* Delete useless tests

* add experts_sched tutorial

---------

Co-authored-by: ouqingliang <1692110604@qq.com>
v0.5.1
2026-01-22 15:40:07 +08:00
Peilin Li
a4de664e62 Add AutoDL Tutorial (#1801) 2026-01-22 14:52:47 +08:00
ErvinXie
d2305538f7 Modify installation steps in Kimi-K2-Thinking-Native.md (#1800)
Updated installation instructions for sglang repository.
2026-01-21 15:46:21 +08:00
mrhaoxx
b27de4068b [fix]: fix exp_avx512 for act_fn (#1797) 2026-01-20 11:07:22 +08:00
Jianwei Dong
027832c590 [feat](kt-kernel): CPU-GPU experts sched (#1796) 2026-01-16 17:01:15 +08:00
Oql
6277da4c2b support GLM 4.7 (#1791)
support GLM 4.7
2026-01-13 17:36:25 +08:00
watamario15
667030d6e6 [kt-kernel]: Fix ignored build configurations in install.sh and CMakeLists.txt (#1789)
* Correct variable defaults

* Remove CMAKE_BUILD_TYPE setting in CMakeLists
2026-01-12 22:16:19 +08:00
Oql
5edc456749 support Native BF16 format MoE. (#1788)
support Native BF16 format MoE
2026-01-12 14:43:28 +08:00
Oql
ddb957596f Fix moe bug. (#1783)
* [fix]: fix moe.hpp load from file bug.

* [fix]: fix all moe hpp init bug.

* [fix]: fix moe & awq-moe ug.
2026-01-05 17:02:24 +08:00
Oql
dc6394e501 [fix]: fix moe hpp bug. (#1780)
fix moe hpp init bug.
2026-01-04 19:32:56 +08:00
ZiWei Yuan
ad7674a6d5 [ci]: Patch ci (#1772)
* [docs]: add kt-cli doc and update corresponding website

* [feat]: update issue template
2025-12-31 12:10:20 +08:00
Jianwei Dong
6d2d7cb057 bump to 0.5.0.post1 (#1771) v0.5.0.post1 2025-12-30 11:09:54 +08:00
Jianwei Dong
47b1bfcff6 Update release-pypi.yml (#1770) 2025-12-30 10:47:28 +08:00
Jianwei Dong
9adc91714f Remove kt-kernel-cuda, kt-kernel uses the version with cuda (#1769) 2025-12-30 10:23:58 +08:00
ZiWei Yuan
b096b01fbc [docs]: add kt-cli doc and update corresponding website (#1768) 2025-12-29 23:06:22 +08:00
ErvinXie
9539ab91eb Cli (#1765)
* [feat]: add custom option for kt run

* [feat]: depth 3
2025-12-29 15:18:42 +08:00
Jianwei Dong
4b235cdaa4 fix cuda wheel build (#1766) 2025-12-29 12:42:06 +08:00
Jianwei Dong
7c127d9fd0 Update release-pypi.yml (#1764) 2025-12-29 11:48:55 +08:00
Jianwei Dong
559a3ad4ac fix pypi cuda install (#1763) 2025-12-29 11:19:43 +08:00
Oql
63796374c1 [docs]: fix and add MiniMax-M2 tutorial images. (#1752) 2025-12-25 20:14:35 +08:00
Jiaqi Liao
be668074de Update tutorial links in README.md (#1749) 2025-12-25 14:26:10 +08:00
Jiaqi Liao
46b0f36980 [feat](kt-kernel): Fix CPU instruction set variants for build & install (#1746)
* [feat]: Enhance CPU feature detection and support for AVX512 extensions

- Added cmake/DetectCPU.cmake for automatic CPU feature detection.
- Updated CMakeLists.txt to include auto-detection logic for AVX512 features.
- Modified install.sh to include new AVX512_VBMI option for FP8 MoE.
- Enhanced _cpu_detect.py to support progressive matching of CPU variants.
- Created scripts/check_cpu_features.py for manual CPU feature checks.
- Updated setup.py to reflect changes in CPU variant building and environment variables.

* [fix](kt-kernel): Add conditional inclusion of FP8 MoE for AVX512 BF16 support

* [chore](kt-kernel): update project version to 0.5.0 in CMakeLists.txt and version.py
2025-12-24 18:57:45 +08:00
ZiWei Yuan
dc5feece8f [docs]: update doc link (#1745) 2025-12-24 18:00:47 +08:00
ZiWei Yuan
3315335fb1 [docs]: update docs to kt-kernel & add amd_blis doc (#1744) 2025-12-24 17:55:15 +08:00
Oql
1d7a476523 [fix]: fix version to 0.5.0 (#1743) v0.5.0 2025-12-24 17:33:22 +08:00
ErvinXie
d8046e1bb4 Kt minimax (#1742)
[feat]: fp8 kernel and kt-cli support
2025-12-24 15:39:44 +08:00
mrhaoxx
e7d277d163 [docs]: refine README for dpo updates (#1740)
* [docs]: refine dpo tutorial

* [docs]: refine README for dpo updates

* Update doc/en/DPO_tutorial.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [docs]: update website doc & refine location

---------

Co-authored-by: ErvinXie <ervinxie@foxmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: ZiWei Yuan <yzwliam@126.com>
2025-12-24 11:20:08 +08:00
mrhaoxx
dee1e211d5 [docs]: refine dpo tutorial (#1739) 2025-12-22 18:44:24 +08:00
Peilin Li
0bce173e3b [feat]: Release version to 0.4.4 (#1738) v0.4.4 2025-12-22 11:20:40 +08:00
Peilin Li
16d5d89f50 [docs]: Update Python version options in DPO tutorial (#1734) 2025-12-20 13:44:35 +08:00
Peilin Li
df998e0f36 [docs]: Add RL-DPO Tutorial (#1733) 2025-12-20 12:49:02 +08:00
Jianwei Dong
39449ed1af update PyPI Install and readme (#1731) 2025-12-18 17:21:47 +08:00
Jiaqi Liao
3c134359bc Fix CPU Instruction Set and Installation (#1729)
* [fix](kt-kernel): fix AVX512 cpu instruction set detection

* [feat](kt-kernel): AVX512 fallback kernel for RAW-INT4

* [fix](kt-kernel): fix setup version issue

* [fix](kt-kernel): update install for custom build

* [docs](kt-kernel): new installation guide for various cpu instruction set

* [fix](kt-kernel): fix _mm512_dpbusd_epi32_compat fallback implmentation

* [style](kt-kernel): clang format
2025-12-18 00:11:57 +08:00
ErvinXie
a8667ddb58 [fix](test): fix import kt-kernel (#1728) 2025-12-17 19:46:32 +08:00
SCDESPERTATE
6fc4080a7d [fix](kt-kernel): fix typo in moe-tp's forward time-profiling (#1720)
* [fix](kt-kernel): fix typo in moe-tp's forward time-profiling

* [fix](kt-kernel): fix the experts count in profiling

---------

Co-authored-by: KMSorSMS <yzwliam@126.com>
2025-12-17 12:06:33 +08:00
Jianwei Dong
661e19a8e5 Update release-pypi.yml (#1726) 2025-12-16 20:37:20 +08:00
Jianwei Dong
5ba3fb56d1 Update release-pypi.yml (#1725) 2025-12-16 20:30:03 +08:00
Jianwei Dong
3126b8deaa Update release-pypi.yml (#1724) 2025-12-16 17:45:29 +08:00
Jianwei Dong
6dcacd9daf Update release-pypi.yml (#1723) 2025-12-16 17:42:28 +08:00
Jianwei Dong
fe8049d3a9 Update release-pypi.yml (#1722) 2025-12-16 17:35:29 +08:00
Jianwei Dong
5ff0026fc1 Update release-pypi.yml (#1721) 2025-12-16 17:29:29 +08:00
Jianwei Dong
1f79f6da92 [feat](kt-kernel): Add automatic deployment workflow (#1719) 2025-12-16 15:20:06 +08:00
Shaoxu Cheng
f25e58ad69 fix: qwen3-npu bugs; update: add readme-for-qwen3-npu (#1717)
* fix: qwen3-npu bugs; update: add readme-for-qwen3-npu

* fix: Correct the README description
2025-12-16 14:27:04 +08:00
RICHARDNAN
18fb8fc897 Npu revise benchmark results and prerequisites (#1716)
* Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

* Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

* Revise Ascend NPU tutorial for Docker deployment

Updated the tutorial for deploying the Ascend NPU, changing sections from 'Conda部署' to '镜像部署' and providing specific commands for Docker container setup and Python environment installation.

* Update DeepseekR1 tutorial for Ascend NPU

* Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

* Update W8A8 weight link in tutorial

* Update doc/zh/DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Refactor Docker command and update package manager

Updated Docker run command to simplify device specifications and corrected package manager command from 'apt' to 'yum'.

* Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

* Revise benchmark results and prerequisites

Updated performance results and hardware specifications.

* Update doc/zh/DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-16 14:26:44 +08:00
ZiWei Yuan
34230eaf44 [docs]: Fix image link in README.md (#1718)
Updated image link to use raw GitHub URL for better accessibility.
2025-12-15 17:10:15 +08:00
SCDESPERTATE
008de19e16 [fix](kt-kernel): drop the weights held in Python for loading weights operation in C++ (#1695) 2025-12-12 11:42:33 +08:00
Shaoxu Cheng
1e69563363 update: add cache class and ascend ln mlp op for qwen3 adapt npu (#1708) 2025-12-11 17:08:35 +08:00
Shaoxu Cheng
cea490a326 update: add ascend attn and experts ops for npu qwen3moe adapt (#1707)
* update: add ascend attn and experts ops for npu qwen3moe adapt

* Reorder import statements in custom_ascend_modelling_qwen3.py

* Restore copyright and import statements

Restored copyright information and imports in ascend_experts.py.
2025-12-11 17:08:15 +08:00
Shaoxu Cheng
adcfa9080f update: Qwen3 MoE model adaptation for NPU (framework) (#1706) 2025-12-11 17:07:57 +08:00