ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2026-03-14 18:37:23 +00:00

Author	SHA1	Message	Date
SCDESPERTATE	b0f827d2a9	[chore](cuda): explicitly use `ele_per_blk` var for better readability (#1784 )	2026-01-23 11:11:08 +08:00
Jianwei Dong	779bf14556	[doc]: add Experts sched tutorial (#1802 ) * Change num gpu experts to gpu expert masks and add eplb statistics * [feat]: update examples * [fix]: fix fp8 perchannel * Delete useless tests * Delete useless tests * add experts_sched tutorial --------- Co-authored-by: ouqingliang <1692110604@qq.com> v0.5.1	2026-01-22 15:40:07 +08:00
Peilin Li	a4de664e62	Add AutoDL Tutorial (#1801 )	2026-01-22 14:52:47 +08:00
ErvinXie	d2305538f7	Modify installation steps in Kimi-K2-Thinking-Native.md (#1800 ) Updated installation instructions for sglang repository.	2026-01-21 15:46:21 +08:00
mrhaoxx	b27de4068b	[fix]: fix exp_avx512 for act_fn (#1797 )	2026-01-20 11:07:22 +08:00
Jianwei Dong	027832c590	[feat](kt-kernel): CPU-GPU experts sched (#1796 )	2026-01-16 17:01:15 +08:00
Oql	6277da4c2b	support GLM 4.7 (#1791 ) support GLM 4.7	2026-01-13 17:36:25 +08:00
watamario15	667030d6e6	[kt-kernel]: Fix ignored build configurations in `install.sh` and `CMakeLists.txt` (#1789 ) * Correct variable defaults * Remove CMAKE_BUILD_TYPE setting in CMakeLists	2026-01-12 22:16:19 +08:00
Oql	5edc456749	support Native BF16 format MoE. (#1788 ) support Native BF16 format MoE	2026-01-12 14:43:28 +08:00
Oql	ddb957596f	Fix moe bug. (#1783 ) * [fix]: fix moe.hpp load from file bug. * [fix]: fix all moe hpp init bug. * [fix]: fix moe & awq-moe ug.	2026-01-05 17:02:24 +08:00
Oql	dc6394e501	[fix]: fix moe hpp bug. (#1780 ) fix moe hpp init bug.	2026-01-04 19:32:56 +08:00
ZiWei Yuan	ad7674a6d5	[ci]: Patch ci (#1772 ) * [docs]: add kt-cli doc and update corresponding website * [feat]: update issue template	2025-12-31 12:10:20 +08:00
Jianwei Dong	6d2d7cb057	bump to 0.5.0.post1 (#1771 ) v0.5.0.post1	2025-12-30 11:09:54 +08:00
Jianwei Dong	47b1bfcff6	Update release-pypi.yml (#1770 )	2025-12-30 10:47:28 +08:00
Jianwei Dong	9adc91714f	Remove kt-kernel-cuda, kt-kernel uses the version with cuda (#1769 )	2025-12-30 10:23:58 +08:00
ZiWei Yuan	b096b01fbc	[docs]: add kt-cli doc and update corresponding website (#1768 )	2025-12-29 23:06:22 +08:00
ErvinXie	9539ab91eb	Cli (#1765 ) * [feat]: add custom option for kt run * [feat]: depth 3	2025-12-29 15:18:42 +08:00
Jianwei Dong	4b235cdaa4	fix cuda wheel build (#1766 )	2025-12-29 12:42:06 +08:00
Jianwei Dong	7c127d9fd0	Update release-pypi.yml (#1764 )	2025-12-29 11:48:55 +08:00
Jianwei Dong	559a3ad4ac	fix pypi cuda install (#1763 )	2025-12-29 11:19:43 +08:00
Oql	63796374c1	[docs]: fix and add MiniMax-M2 tutorial images. (#1752 )	2025-12-25 20:14:35 +08:00
Jiaqi Liao	be668074de	Update tutorial links in README.md (#1749 )	2025-12-25 14:26:10 +08:00
Jiaqi Liao	46b0f36980	[feat](kt-kernel): Fix CPU instruction set variants for build & install (#1746 ) * [feat]: Enhance CPU feature detection and support for AVX512 extensions - Added cmake/DetectCPU.cmake for automatic CPU feature detection. - Updated CMakeLists.txt to include auto-detection logic for AVX512 features. - Modified install.sh to include new AVX512_VBMI option for FP8 MoE. - Enhanced _cpu_detect.py to support progressive matching of CPU variants. - Created scripts/check_cpu_features.py for manual CPU feature checks. - Updated setup.py to reflect changes in CPU variant building and environment variables. * [fix](kt-kernel): Add conditional inclusion of FP8 MoE for AVX512 BF16 support * [chore](kt-kernel): update project version to 0.5.0 in CMakeLists.txt and version.py	2025-12-24 18:57:45 +08:00
ZiWei Yuan	dc5feece8f	[docs]: update doc link (#1745 )	2025-12-24 18:00:47 +08:00
ZiWei Yuan	3315335fb1	[docs]: update docs to kt-kernel & add amd_blis doc (#1744 )	2025-12-24 17:55:15 +08:00
Oql	1d7a476523	[fix]: fix version to 0.5.0 (#1743 ) v0.5.0	2025-12-24 17:33:22 +08:00
ErvinXie	d8046e1bb4	Kt minimax (#1742 ) [feat]: fp8 kernel and kt-cli support	2025-12-24 15:39:44 +08:00
mrhaoxx	e7d277d163	[docs]: refine README for dpo updates (#1740 ) * [docs]: refine dpo tutorial * [docs]: refine README for dpo updates * Update doc/en/DPO_tutorial.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * [docs]: update website doc & refine location --------- Co-authored-by: ErvinXie <ervinxie@foxmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: ZiWei Yuan <yzwliam@126.com>	2025-12-24 11:20:08 +08:00
mrhaoxx	dee1e211d5	[docs]: refine dpo tutorial (#1739 )	2025-12-22 18:44:24 +08:00
Peilin Li	0bce173e3b	[feat]: Release version to 0.4.4 (#1738 ) v0.4.4	2025-12-22 11:20:40 +08:00
Peilin Li	16d5d89f50	[docs]: Update Python version options in DPO tutorial (#1734 )	2025-12-20 13:44:35 +08:00
Peilin Li	df998e0f36	[docs]: Add RL-DPO Tutorial (#1733 )	2025-12-20 12:49:02 +08:00
Jianwei Dong	39449ed1af	update PyPI Install and readme (#1731 )	2025-12-18 17:21:47 +08:00
Jiaqi Liao	3c134359bc	Fix CPU Instruction Set and Installation (#1729 ) * [fix](kt-kernel): fix AVX512 cpu instruction set detection * [feat](kt-kernel): AVX512 fallback kernel for RAW-INT4 * [fix](kt-kernel): fix setup version issue * [fix](kt-kernel): update install for custom build * [docs](kt-kernel): new installation guide for various cpu instruction set * [fix](kt-kernel): fix _mm512_dpbusd_epi32_compat fallback implmentation * [style](kt-kernel): clang format	2025-12-18 00:11:57 +08:00
ErvinXie	a8667ddb58	[fix](test): fix import kt-kernel (#1728 )	2025-12-17 19:46:32 +08:00
SCDESPERTATE	6fc4080a7d	[fix](kt-kernel): fix typo in moe-tp's forward time-profiling (#1720 ) * [fix](kt-kernel): fix typo in moe-tp's forward time-profiling * [fix](kt-kernel): fix the experts count in profiling --------- Co-authored-by: KMSorSMS <yzwliam@126.com>	2025-12-17 12:06:33 +08:00
Jianwei Dong	661e19a8e5	Update release-pypi.yml (#1726 )	2025-12-16 20:37:20 +08:00
Jianwei Dong	5ba3fb56d1	Update release-pypi.yml (#1725 )	2025-12-16 20:30:03 +08:00
Jianwei Dong	3126b8deaa	Update release-pypi.yml (#1724 )	2025-12-16 17:45:29 +08:00
Jianwei Dong	6dcacd9daf	Update release-pypi.yml (#1723 )	2025-12-16 17:42:28 +08:00
Jianwei Dong	fe8049d3a9	Update release-pypi.yml (#1722 )	2025-12-16 17:35:29 +08:00
Jianwei Dong	5ff0026fc1	Update release-pypi.yml (#1721 )	2025-12-16 17:29:29 +08:00
Jianwei Dong	1f79f6da92	[feat](kt-kernel): Add automatic deployment workflow (#1719 )	2025-12-16 15:20:06 +08:00
Shaoxu Cheng	f25e58ad69	fix: qwen3-npu bugs; update: add readme-for-qwen3-npu (#1717 ) * fix: qwen3-npu bugs; update: add readme-for-qwen3-npu * fix: Correct the README description	2025-12-16 14:27:04 +08:00
RICHARDNAN	18fb8fc897	Npu revise benchmark results and prerequisites (#1716 ) * Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md * Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md * Revise Ascend NPU tutorial for Docker deployment Updated the tutorial for deploying the Ascend NPU, changing sections from 'Conda部署' to '镜像部署' and providing specific commands for Docker container setup and Python environment installation. * Update DeepseekR1 tutorial for Ascend NPU * Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md * Update W8A8 weight link in tutorial * Update doc/zh/DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Refactor Docker command and update package manager Updated Docker run command to simplify device specifications and corrected package manager command from 'apt' to 'yum'. * Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md * Revise benchmark results and prerequisites Updated performance results and hardware specifications. * Update doc/zh/DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-16 14:26:44 +08:00
ZiWei Yuan	34230eaf44	[docs]: Fix image link in README.md (#1718 ) Updated image link to use raw GitHub URL for better accessibility.	2025-12-15 17:10:15 +08:00
SCDESPERTATE	008de19e16	[fix](kt-kernel): drop the weights held in Python for loading weights operation in C++ (#1695 )	2025-12-12 11:42:33 +08:00
Shaoxu Cheng	1e69563363	update: add cache class and ascend ln mlp op for qwen3 adapt npu (#1708 )	2025-12-11 17:08:35 +08:00
Shaoxu Cheng	cea490a326	update: add ascend attn and experts ops for npu qwen3moe adapt (#1707 ) * update: add ascend attn and experts ops for npu qwen3moe adapt * Reorder import statements in custom_ascend_modelling_qwen3.py * Restore copyright and import statements Restored copyright information and imports in ascend_experts.py.	2025-12-11 17:08:15 +08:00
Shaoxu Cheng	adcfa9080f	update: Qwen3 MoE model adaptation for NPU (framework) (#1706 )	2025-12-11 17:07:57 +08:00

1 2 3 4 5 ...

1192 Commits