ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2026-04-20 14:29:22 +00:00

Author	SHA1	Message	Date
ErvinXie	d8046e1bb4	Kt minimax (#1742 ) [feat]: fp8 kernel and kt-cli support	2025-12-24 15:39:44 +08:00
mrhaoxx	e7d277d163	[docs]: refine README for dpo updates (#1740 ) * [docs]: refine dpo tutorial * [docs]: refine README for dpo updates * Update doc/en/DPO_tutorial.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * [docs]: update website doc & refine location --------- Co-authored-by: ErvinXie <ervinxie@foxmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: ZiWei Yuan <yzwliam@126.com>	2025-12-24 11:20:08 +08:00
mrhaoxx	dee1e211d5	[docs]: refine dpo tutorial (#1739 )	2025-12-22 18:44:24 +08:00
Peilin Li	0bce173e3b	[feat]: Release version to 0.4.4 (#1738 ) v0.4.4	2025-12-22 11:20:40 +08:00
Peilin Li	16d5d89f50	[docs]: Update Python version options in DPO tutorial (#1734 )	2025-12-20 13:44:35 +08:00
Peilin Li	df998e0f36	[docs]: Add RL-DPO Tutorial (#1733 )	2025-12-20 12:49:02 +08:00
Jianwei Dong	39449ed1af	update PyPI Install and readme (#1731 )	2025-12-18 17:21:47 +08:00
Jiaqi Liao	3c134359bc	Fix CPU Instruction Set and Installation (#1729 ) * [fix](kt-kernel): fix AVX512 cpu instruction set detection * [feat](kt-kernel): AVX512 fallback kernel for RAW-INT4 * [fix](kt-kernel): fix setup version issue * [fix](kt-kernel): update install for custom build * [docs](kt-kernel): new installation guide for various cpu instruction set * [fix](kt-kernel): fix _mm512_dpbusd_epi32_compat fallback implmentation * [style](kt-kernel): clang format	2025-12-18 00:11:57 +08:00
ErvinXie	a8667ddb58	[fix](test): fix import kt-kernel (#1728 )	2025-12-17 19:46:32 +08:00
SCDESPERTATE	6fc4080a7d	[fix](kt-kernel): fix typo in moe-tp's forward time-profiling (#1720 ) * [fix](kt-kernel): fix typo in moe-tp's forward time-profiling * [fix](kt-kernel): fix the experts count in profiling --------- Co-authored-by: KMSorSMS <yzwliam@126.com>	2025-12-17 12:06:33 +08:00
Jianwei Dong	661e19a8e5	Update release-pypi.yml (#1726 )	2025-12-16 20:37:20 +08:00
Jianwei Dong	5ba3fb56d1	Update release-pypi.yml (#1725 )	2025-12-16 20:30:03 +08:00
Jianwei Dong	3126b8deaa	Update release-pypi.yml (#1724 )	2025-12-16 17:45:29 +08:00
Jianwei Dong	6dcacd9daf	Update release-pypi.yml (#1723 )	2025-12-16 17:42:28 +08:00
Jianwei Dong	fe8049d3a9	Update release-pypi.yml (#1722 )	2025-12-16 17:35:29 +08:00
Jianwei Dong	5ff0026fc1	Update release-pypi.yml (#1721 )	2025-12-16 17:29:29 +08:00
Jianwei Dong	1f79f6da92	[feat](kt-kernel): Add automatic deployment workflow (#1719 )	2025-12-16 15:20:06 +08:00
Shaoxu Cheng	f25e58ad69	fix: qwen3-npu bugs; update: add readme-for-qwen3-npu (#1717 ) * fix: qwen3-npu bugs; update: add readme-for-qwen3-npu * fix: Correct the README description	2025-12-16 14:27:04 +08:00
RICHARDNAN	18fb8fc897	Npu revise benchmark results and prerequisites (#1716 ) * Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md * Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md * Revise Ascend NPU tutorial for Docker deployment Updated the tutorial for deploying the Ascend NPU, changing sections from 'Conda部署' to '镜像部署' and providing specific commands for Docker container setup and Python environment installation. * Update DeepseekR1 tutorial for Ascend NPU * Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md * Update W8A8 weight link in tutorial * Update doc/zh/DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Refactor Docker command and update package manager Updated Docker run command to simplify device specifications and corrected package manager command from 'apt' to 'yum'. * Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md * Revise benchmark results and prerequisites Updated performance results and hardware specifications. * Update doc/zh/DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-16 14:26:44 +08:00
ZiWei Yuan	34230eaf44	[docs]: Fix image link in README.md (#1718 ) Updated image link to use raw GitHub URL for better accessibility.	2025-12-15 17:10:15 +08:00
SCDESPERTATE	008de19e16	[fix](kt-kernel): drop the weights held in Python for loading weights operation in C++ (#1695 )	2025-12-12 11:42:33 +08:00
Shaoxu Cheng	1e69563363	update: add cache class and ascend ln mlp op for qwen3 adapt npu (#1708 )	2025-12-11 17:08:35 +08:00
Shaoxu Cheng	cea490a326	update: add ascend attn and experts ops for npu qwen3moe adapt (#1707 ) * update: add ascend attn and experts ops for npu qwen3moe adapt * Reorder import statements in custom_ascend_modelling_qwen3.py * Restore copyright and import statements Restored copyright information and imports in ascend_experts.py.	2025-12-11 17:08:15 +08:00
Shaoxu Cheng	adcfa9080f	update: Qwen3 MoE model adaptation for NPU (framework) (#1706 )	2025-12-11 17:07:57 +08:00
ZiWei Yuan	53f6a6d6e1	[feat]: patch kml problem (#1704 )	2025-12-11 14:40:29 +08:00
Jianwei Dong	c65febe05c	[feat]: Automatically detect whether blis is installed on amd cpus (#1702 )	2025-12-11 14:25:36 +08:00
RICHARDNAN	6431888928	add deploy in docker image (#1691 )	2025-12-11 14:11:27 +08:00
ZiWei Yuan	2f1b743050	[docs]: update website doc png (#1696 )	2025-12-11 13:01:32 +08:00
Oql	e87a042ef0	[fix](kt-kernel): fix write_buffer do numa job (#1699 )	2025-12-10 16:39:16 +08:00
Shaoxu Cheng	8995378a91	update: add attention and ln ut for npu (#1698 )	2025-12-10 16:12:26 +08:00
mrhaoxx	f992de55da	[fix](kt-sft): fix peft adaptations for RL tasks (#1674 )	2025-12-09 14:28:51 +08:00
mrhaoxx	503295fc88	[feat](kt-kernel): refactor convert_cpu_weights.py to support conversation for GLM-4.6V (#1687 ) Signed-off-by: mrhaoxx <mr.haoxx@gmail.com>	2025-12-09 14:24:41 +08:00
Oql	ac69ea891e	Fix K2 MoE decode bug in buffer management (#1686 )	2025-12-08 21:08:28 +08:00
Oql	8139c092bf	Reduce CPU memory usage during large chunk prefill (Fixes #1676 ) (#1683 ) * fix(amx): add BufferASmallKGroupImpl to fix buffer overflow in from_mat The original BufferAKGroupImpl::from_mat writes 64 bytes per K_STEP iteration but when K_STEP=32 (for GemmKernel224Int4SmallKGroup), this causes buffer overflow. BufferASmallKGroupImpl overrides from_mat to write only 32 bytes per iteration. * perf(k2-moe): optimize memory allocation with pooled buffers - Replace per-expert buffer allocation with shared memory pools - Dynamically assign buffer slices based on activated experts - Add group_size inference from scale tensor shape in amx.py * delete kimi k2 forward test * add TODO comment for pool_count_ calculation	2025-12-08 20:19:07 +08:00
ErvinXie	eefc8cf98d	更新 Kimi-K2-Thinking-Native.md (#1684 )	2025-12-08 19:58:20 +08:00
Jiaqi Liao	f20e5d1da5	Revise prefill strategy and performance metrics (#1675 ) Updated the prefill strategy descriptions and performance benchmarks in the documentation.	2025-12-06 15:36:04 +08:00
Jiaqi Liao	1d62ac21f7	Update Kimi-K2-Thinking-Native.md (#1673 )	2025-12-05 23:08:02 +08:00
Jiaqi Liao	69fa7b1a57	Revise installation steps in Kimi-K2 documentation (#1672 ) Updated installation instructions and added steps for cloning the repository.	2025-12-05 23:05:24 +08:00
Jiaqi Liao	721b6c4c94	[docs] Update Native Kimi-K2-Thinking documentation and kt-kernel parameters (#1671 ) v0.4.3	2025-12-05 22:46:16 +08:00
Jiaqi Liao	47da806cde	[doc](kt-kernel): add kimi-k2-thinking (#1670 )	2025-12-05 21:53:59 +08:00
ErvinXie	71f683acec	Support Native Kimi K2 Thinking (#1663 ) * [feat]: fix k2 prefill * Update Kimi-K2-Thinking.md * Create Kimi-K2-Thinking-Native.md * Update Kimi-K2-Thinking.md * Update Kimi-K2-Thinking.md * Update Kimi-K2-Thinking-Native.md * [perf] optimize K2 MoE weight loading with per-expert pointers - Avoid expensive torch.stack().contiguous() in Python (was ~6.6s) - Use per-expert pointer arrays (gate_projs) instead of contiguous memory - C++ worker pool performs parallel memcpy for TP slicing - Add LOAD_TIME_PROFILE for load_weights timing analysis 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: ouqingliang <1692110604@qq.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-12-05 21:53:05 +08:00
ZiWei Yuan	4850424345	[docs]: add amd blis backend usage guide (#1669 )	2025-12-05 16:52:26 +08:00
Jiaqi Liao	1ca3a2662e	Add 9#AISoft to the list of contributors (#1668 )	2025-12-05 15:44:04 +08:00
Jiaqi Liao	0698252484	[fix](kt-kernel): gate RAWINT4 behind AVX512 and avoid AVX2 build break (#1660 )	2025-12-03 00:43:23 +08:00
Jianwei Dong	670c488155	[docs]: Add deepseek-v3.2 run tutorial (#1659 )	2025-12-02 20:04:10 +08:00
Jiaqi Liao	fcf8882075	[Feature] Add avx-based kimi-k2 support (#1656 ) * support Kimi-K2-Thinking original weight fix amx kernel bug * update k2 avx kernel. * feat: add CPUInfer write buffer task * [feat]: add kimi k2 cpu write buffer support - Implement write_weights_to_buffer function in k2-moe.hpp for extracting GPU expert weights - Fix down (w2) weight column-wise slicing for different TP configurations - Support three TP scenarios: cpu_tp == gpu_tp, cpu_tp > gpu_tp, cpu_tp < gpu_tp - Add comprehensive test cases for weight extraction validation - Ensure compatibility with Kimi model's MoE architecture * [fix]: correct write_weight_scale_to_buffer expert offset calculation Fixed the bug in write_weight_scale_to_buffer_task where expert offsets in GPU buffers were incorrectly calculated. Changed from using per_expert_gpu sizes to using full gpu_tp sizes, ensuring correct memory layout for multi-expert scenarios. Also added benchmark scripts for k2 moe and write buffer operations, and cleaned up debug output in test files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * [feat]: add write buffer wrapper * [fix] fix comment --------- Co-authored-by: ouqingliang <1692110604@qq.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-12-02 16:01:07 +08:00
ZiWei Yuan	c2b8c60c4e	[ci]: add int4_1 & int4_1k (#1653 ) * [feat]: init amd adaption * [feat]: add blis support * [fix]: fix setup and moe kernel warpper * [fix](setup.py): support rebuild with cache and import kt_kernel works fine * [feat]: add moe_kernel converter for amd and implement the load method(haven't tested yet) * [feat](moe_kernel/moe.hpp): delete unused memory when using save * [fix](moe_kernel): update PLAIN for pack * [fix](moe_kernel): rm printf debug * [fix](moe_kernel): skip gpu experts * [fix](moe_kernel/moe.hpp): update include memory path * [feat](moe_kernel/moe.hpp): support expert deferral * [feat]: finish amd * [ci]: add int4_1 & int4_1k --------- Co-authored-by: mrhaoxx <mr.haoxx@gmail.com>	2025-12-02 15:58:14 +08:00
Jianwei Dong	fd78fe520a	fix(scripts): resolve OOM when converting gpu weights and update README (#1640 )	2025-12-01 14:15:14 +08:00
Peilin Li	e637fedc65	[docs]: Add Full introduction of KT (#1636 )	2025-11-29 15:46:55 +08:00
Peilin Li	7ee80bbc3d	[docs]: Update README with Python 3.12 and dependency changes (#1634 ) Updated Python version in installation instructions and adjusted KTransformers and flash-attention wheel filenames accordingly.	2025-11-29 15:46:05 +08:00

1 2 3 4 5 ...

1166 Commits