Commit Graph

1166 Commits

Author SHA1 Message Date
ErvinXie
d8046e1bb4 Kt minimax (#1742)
[feat]: fp8 kernel and kt-cli support
2025-12-24 15:39:44 +08:00
mrhaoxx
e7d277d163 [docs]: refine README for dpo updates (#1740)
* [docs]: refine dpo tutorial

* [docs]: refine README for dpo updates

* Update doc/en/DPO_tutorial.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [docs]: update website doc & refine location

---------

Co-authored-by: ErvinXie <ervinxie@foxmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: ZiWei Yuan <yzwliam@126.com>
2025-12-24 11:20:08 +08:00
mrhaoxx
dee1e211d5 [docs]: refine dpo tutorial (#1739) 2025-12-22 18:44:24 +08:00
Peilin Li
0bce173e3b [feat]: Release version to 0.4.4 (#1738) v0.4.4 2025-12-22 11:20:40 +08:00
Peilin Li
16d5d89f50 [docs]: Update Python version options in DPO tutorial (#1734) 2025-12-20 13:44:35 +08:00
Peilin Li
df998e0f36 [docs]: Add RL-DPO Tutorial (#1733) 2025-12-20 12:49:02 +08:00
Jianwei Dong
39449ed1af update PyPI Install and readme (#1731) 2025-12-18 17:21:47 +08:00
Jiaqi Liao
3c134359bc Fix CPU Instruction Set and Installation (#1729)
* [fix](kt-kernel): fix AVX512 cpu instruction set detection

* [feat](kt-kernel): AVX512 fallback kernel for RAW-INT4

* [fix](kt-kernel): fix setup version issue

* [fix](kt-kernel): update install for custom build

* [docs](kt-kernel): new installation guide for various cpu instruction set

* [fix](kt-kernel): fix _mm512_dpbusd_epi32_compat fallback implmentation

* [style](kt-kernel): clang format
2025-12-18 00:11:57 +08:00
ErvinXie
a8667ddb58 [fix](test): fix import kt-kernel (#1728) 2025-12-17 19:46:32 +08:00
SCDESPERTATE
6fc4080a7d [fix](kt-kernel): fix typo in moe-tp's forward time-profiling (#1720)
* [fix](kt-kernel): fix typo in moe-tp's forward time-profiling

* [fix](kt-kernel): fix the experts count in profiling

---------

Co-authored-by: KMSorSMS <yzwliam@126.com>
2025-12-17 12:06:33 +08:00
Jianwei Dong
661e19a8e5 Update release-pypi.yml (#1726) 2025-12-16 20:37:20 +08:00
Jianwei Dong
5ba3fb56d1 Update release-pypi.yml (#1725) 2025-12-16 20:30:03 +08:00
Jianwei Dong
3126b8deaa Update release-pypi.yml (#1724) 2025-12-16 17:45:29 +08:00
Jianwei Dong
6dcacd9daf Update release-pypi.yml (#1723) 2025-12-16 17:42:28 +08:00
Jianwei Dong
fe8049d3a9 Update release-pypi.yml (#1722) 2025-12-16 17:35:29 +08:00
Jianwei Dong
5ff0026fc1 Update release-pypi.yml (#1721) 2025-12-16 17:29:29 +08:00
Jianwei Dong
1f79f6da92 [feat](kt-kernel): Add automatic deployment workflow (#1719) 2025-12-16 15:20:06 +08:00
Shaoxu Cheng
f25e58ad69 fix: qwen3-npu bugs; update: add readme-for-qwen3-npu (#1717)
* fix: qwen3-npu bugs; update: add readme-for-qwen3-npu

* fix: Correct the README description
2025-12-16 14:27:04 +08:00
RICHARDNAN
18fb8fc897 Npu revise benchmark results and prerequisites (#1716)
* Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

* Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

* Revise Ascend NPU tutorial for Docker deployment

Updated the tutorial for deploying the Ascend NPU, changing sections from 'Conda部署' to '镜像部署' and providing specific commands for Docker container setup and Python environment installation.

* Update DeepseekR1 tutorial for Ascend NPU

* Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

* Update W8A8 weight link in tutorial

* Update doc/zh/DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Refactor Docker command and update package manager

Updated Docker run command to simplify device specifications and corrected package manager command from 'apt' to 'yum'.

* Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

* Revise benchmark results and prerequisites

Updated performance results and hardware specifications.

* Update doc/zh/DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-16 14:26:44 +08:00
ZiWei Yuan
34230eaf44 [docs]: Fix image link in README.md (#1718)
Updated image link to use raw GitHub URL for better accessibility.
2025-12-15 17:10:15 +08:00
SCDESPERTATE
008de19e16 [fix](kt-kernel): drop the weights held in Python for loading weights operation in C++ (#1695) 2025-12-12 11:42:33 +08:00
Shaoxu Cheng
1e69563363 update: add cache class and ascend ln mlp op for qwen3 adapt npu (#1708) 2025-12-11 17:08:35 +08:00
Shaoxu Cheng
cea490a326 update: add ascend attn and experts ops for npu qwen3moe adapt (#1707)
* update: add ascend attn and experts ops for npu qwen3moe adapt

* Reorder import statements in custom_ascend_modelling_qwen3.py

* Restore copyright and import statements

Restored copyright information and imports in ascend_experts.py.
2025-12-11 17:08:15 +08:00
Shaoxu Cheng
adcfa9080f update: Qwen3 MoE model adaptation for NPU (framework) (#1706) 2025-12-11 17:07:57 +08:00
ZiWei Yuan
53f6a6d6e1 [feat]: patch kml problem (#1704) 2025-12-11 14:40:29 +08:00
Jianwei Dong
c65febe05c [feat]: Automatically detect whether blis is installed on amd cpus (#1702) 2025-12-11 14:25:36 +08:00
RICHARDNAN
6431888928 add deploy in docker image (#1691) 2025-12-11 14:11:27 +08:00
ZiWei Yuan
2f1b743050 [docs]: update website doc png (#1696) 2025-12-11 13:01:32 +08:00
Oql
e87a042ef0 [fix](kt-kernel): fix write_buffer do numa job (#1699) 2025-12-10 16:39:16 +08:00
Shaoxu Cheng
8995378a91 update: add attention and ln ut for npu (#1698) 2025-12-10 16:12:26 +08:00
mrhaoxx
f992de55da [fix](kt-sft): fix peft adaptations for RL tasks (#1674) 2025-12-09 14:28:51 +08:00
mrhaoxx
503295fc88 [feat](kt-kernel): refactor convert_cpu_weights.py to support conversation for GLM-4.6V (#1687)
Signed-off-by: mrhaoxx <mr.haoxx@gmail.com>
2025-12-09 14:24:41 +08:00
Oql
ac69ea891e Fix K2 MoE decode bug in buffer management (#1686) 2025-12-08 21:08:28 +08:00
Oql
8139c092bf Reduce CPU memory usage during large chunk prefill (Fixes #1676) (#1683)
* fix(amx): add BufferASmallKGroupImpl to fix buffer overflow in from_mat

The original BufferAKGroupImpl::from_mat writes 64 bytes per K_STEP iteration
but when K_STEP=32 (for GemmKernel224Int4SmallKGroup), this causes buffer overflow.

BufferASmallKGroupImpl overrides from_mat to write only 32 bytes per iteration.

* perf(k2-moe): optimize memory allocation with pooled buffers

- Replace per-expert buffer allocation with shared memory pools
- Dynamically assign buffer slices based on activated experts
- Add group_size inference from scale tensor shape in amx.py

* delete kimi k2 forward test

* add TODO comment for pool_count_ calculation
2025-12-08 20:19:07 +08:00
ErvinXie
eefc8cf98d 更新 Kimi-K2-Thinking-Native.md (#1684) 2025-12-08 19:58:20 +08:00
Jiaqi Liao
f20e5d1da5 Revise prefill strategy and performance metrics (#1675)
Updated the prefill strategy descriptions and performance benchmarks in the documentation.
2025-12-06 15:36:04 +08:00
Jiaqi Liao
1d62ac21f7 Update Kimi-K2-Thinking-Native.md (#1673) 2025-12-05 23:08:02 +08:00
Jiaqi Liao
69fa7b1a57 Revise installation steps in Kimi-K2 documentation (#1672)
Updated installation instructions and added steps for cloning the repository.
2025-12-05 23:05:24 +08:00
Jiaqi Liao
721b6c4c94 [docs] Update Native Kimi-K2-Thinking documentation and kt-kernel parameters (#1671) v0.4.3 2025-12-05 22:46:16 +08:00
Jiaqi Liao
47da806cde [doc](kt-kernel): add kimi-k2-thinking (#1670) 2025-12-05 21:53:59 +08:00
ErvinXie
71f683acec Support Native Kimi K2 Thinking (#1663)
* [feat]: fix k2 prefill

* Update Kimi-K2-Thinking.md

* Create Kimi-K2-Thinking-Native.md

* Update Kimi-K2-Thinking.md

* Update Kimi-K2-Thinking.md

* Update Kimi-K2-Thinking-Native.md

* [perf] optimize K2 MoE weight loading with per-expert pointers

- Avoid expensive torch.stack().contiguous() in Python (was ~6.6s)
- Use per-expert pointer arrays (gate_projs) instead of contiguous memory
- C++ worker pool performs parallel memcpy for TP slicing
- Add LOAD_TIME_PROFILE for load_weights timing analysis

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: ouqingliang <1692110604@qq.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-05 21:53:05 +08:00
ZiWei Yuan
4850424345 [docs]: add amd blis backend usage guide (#1669) 2025-12-05 16:52:26 +08:00
Jiaqi Liao
1ca3a2662e Add 9#AISoft to the list of contributors (#1668) 2025-12-05 15:44:04 +08:00
Jiaqi Liao
0698252484 [fix](kt-kernel): gate RAWINT4 behind AVX512 and avoid AVX2 build break (#1660) 2025-12-03 00:43:23 +08:00
Jianwei Dong
670c488155 [docs]: Add deepseek-v3.2 run tutorial (#1659) 2025-12-02 20:04:10 +08:00
Jiaqi Liao
fcf8882075 [Feature] Add avx-based kimi-k2 support (#1656)
* support Kimi-K2-Thinking original weight
fix amx kernel bug

* update k2 avx kernel.

* feat: add CPUInfer write buffer task

* [feat]: add kimi k2 cpu write buffer support

- Implement write_weights_to_buffer function in k2-moe.hpp for extracting GPU expert weights
- Fix down (w2) weight column-wise slicing for different TP configurations
- Support three TP scenarios: cpu_tp == gpu_tp, cpu_tp > gpu_tp, cpu_tp < gpu_tp
- Add comprehensive test cases for weight extraction validation
- Ensure compatibility with Kimi model's MoE architecture

* [fix]: correct write_weight_scale_to_buffer expert offset calculation

Fixed the bug in write_weight_scale_to_buffer_task where expert offsets in GPU buffers were incorrectly calculated. Changed from using per_expert_gpu sizes to using full gpu_tp sizes, ensuring correct memory layout for multi-expert scenarios.

Also added benchmark scripts for k2 moe and write buffer operations, and cleaned up debug output in test files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* [feat]: add write buffer wrapper

* [fix] fix comment

---------

Co-authored-by: ouqingliang <1692110604@qq.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-02 16:01:07 +08:00
ZiWei Yuan
c2b8c60c4e [ci]: add int4_1 & int4_1k (#1653)
* [feat]: init amd adaption

* [feat]: add blis support

* [fix]: fix setup and moe kernel warpper

* [fix](setup.py): support rebuild with cache and import kt_kernel works
fine

* [feat]: add moe_kernel converter for amd and implement the load
method(haven't tested yet)

* [feat](moe_kernel/moe.hpp): delete unused memory when using save

* [fix](moe_kernel): update PLAIN for pack

* [fix](moe_kernel): rm printf debug

* [fix](moe_kernel): skip gpu experts

* [fix](moe_kernel/moe.hpp): update include memory path

* [feat](moe_kernel/moe.hpp): support expert deferral

* [feat]: finish amd

* [ci]: add int4_1 & int4_1k

---------

Co-authored-by: mrhaoxx <mr.haoxx@gmail.com>
2025-12-02 15:58:14 +08:00
Jianwei Dong
fd78fe520a fix(scripts): resolve OOM when converting gpu weights and update README (#1640) 2025-12-01 14:15:14 +08:00
Peilin Li
e637fedc65 [docs]: Add Full introduction of KT (#1636) 2025-11-29 15:46:55 +08:00
Peilin Li
7ee80bbc3d [docs]: Update README with Python 3.12 and dependency changes (#1634)
Updated Python version in installation instructions and adjusted KTransformers and flash-attention wheel filenames accordingly.
2025-11-29 15:46:05 +08:00