ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2026-03-14 18:37:23 +00:00

Author	SHA1	Message	Date
Jianwei Dong	15c624dcae	Fix/sglang kt detection (#1875 ) * [feat]: simplify sglang installation with submodule, auto-sync CI, and version alignment - Add kvcache-ai/sglang as git submodule at third_party/sglang (branch = main) - Add top-level install.sh for one-click source installation (sglang + kt-kernel) - Add sglang-kt as hard dependency in kt-kernel/pyproject.toml - Add CI workflow to auto-sync sglang submodule daily and create PR - Add CI workflow to build and publish sglang-kt to PyPI - Integrate sglang-kt build into release-pypi.yml (version.py bump publishes both packages) - Align sglang-kt version with ktransformers via SGLANG_KT_VERSION env var injection - Update Dockerfile to use submodule and inject aligned version - Update all 13 doc files, CLI hints, and i18n strings to reference new install methods Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [build]: bump version to 0.5.2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [build]: rename PyPI package from kt-kernel to ktransformers Users can now `pip install ktransformers` to get everything (sglang-kt is auto-installed as a dependency). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert "[build]: rename PyPI package from kt-kernel to ktransformers" This reverts commit `e0cbbf6364`. * [build]: add ktransformers meta-package for PyPI `pip install ktransformers` now works as a single install command. It pulls kt-kernel (which in turn pulls sglang-kt). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [fix]: show sglang-kt package version in kt version command - Prioritize sglang-kt package version (aligned with ktransformers) over sglang internal __version__ - Update display name from "sglang" to "sglang-kt" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [fix]: improve sglang-kt detection in kt doctor and kt version Recognize sglang-kt package name as proof of kvcache-ai fork installation. Previously both commands fell through to "PyPI (not recommended)" for non-editable local source installs. Now version.py reuses the centralized check_sglang_installation() logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [build]: bump version to 0.5.2.post1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 16:54:48 +08:00
Jianwei Dong	16a8b98f3e	support qwen3.5 (#1846 )	2026-02-16 15:48:14 +08:00
Jiaqi Liao	a3d5d53605	Update MiniMax-M2.5.md (#1849 )	2026-02-13 22:35:36 +08:00
Jianwei Dong	f0e4fc612b	support minimax-m2.5 (#1848 )	2026-02-13 19:15:44 +08:00
Oql	1c72b3f5bd	fix glm5 docs (#1845 )	2026-02-12 02:33:37 +08:00
Oql	7f7aeaeff6	support glm 5 (#1844 )	2026-02-12 02:03:32 +08:00
Jiaqi Liao	061fb56382	Update Kimi-K2.5.md (#1838 )	2026-02-07 16:38:39 +08:00
Oql	4f64665758	[docs]: add Qwen3 Coder Next Tutorial (#1833 )	2026-02-04 16:27:10 +08:00
Jiaqi Liao	794c04fae4	Revert "[doc]: update kimi_k2.5 doc (#1823 )" (#1825 ) This reverts commit `2e6506535b`.	2026-01-30 16:10:01 +08:00
Oql	ccbb5b1cf8	[docs]: add clawd bot docs add clawd bot docs	2026-01-30 15:51:30 +08:00
Jiaqi Liao	2e6506535b	[doc]: update kimi_k2.5 doc (#1823 )	2026-01-30 15:43:18 +08:00
Peilin Li	8321d00cc5	Add files via upload (#1814 )	2026-01-27 17:44:50 +08:00
Jiaqi Liao	2f6f7f1921	Kimi k2.5 doc (#1812 ) * [doc]: add Kimi-K2.5 deploy&sft guide * [doc]: add Kimi-K2.5 deploy&sft guide	2026-01-27 13:33:25 +08:00
Jiaqi Liao	1da075a3fa	Revert "[doc]: add Kimi-K2.5 deploy&sft guide (#1810 )" (#1811 ) This reverts commit `a368140d76`.	2026-01-27 10:05:13 +08:00
Jiaqi Liao	a368140d76	[doc]: add Kimi-K2.5 deploy&sft guide (#1810 )	2026-01-27 10:02:59 +08:00
Oql	bf4c8a690b	Add Native Precision Tutorial, update worker strategy and README.md (#1807 )	2026-01-23 18:00:13 +08:00
Jianwei Dong	8652346e69	[fix]: doc (#1805 )	2026-01-23 11:17:08 +08:00
Jianwei Dong	779bf14556	[doc]: add Experts sched tutorial (#1802 ) * Change num gpu experts to gpu expert masks and add eplb statistics * [feat]: update examples * [fix]: fix fp8 perchannel * Delete useless tests * Delete useless tests * add experts_sched tutorial --------- Co-authored-by: ouqingliang <1692110604@qq.com>	2026-01-22 15:40:07 +08:00
Peilin Li	a4de664e62	Add AutoDL Tutorial (#1801 )	2026-01-22 14:52:47 +08:00
ErvinXie	d2305538f7	Modify installation steps in Kimi-K2-Thinking-Native.md (#1800 ) Updated installation instructions for sglang repository.	2026-01-21 15:46:21 +08:00
ZiWei Yuan	b096b01fbc	[docs]: add kt-cli doc and update corresponding website (#1768 )	2025-12-29 23:06:22 +08:00
Oql	63796374c1	[docs]: fix and add MiniMax-M2 tutorial images. (#1752 )	2025-12-25 20:14:35 +08:00
ZiWei Yuan	3315335fb1	[docs]: update docs to kt-kernel & add amd_blis doc (#1744 )	2025-12-24 17:55:15 +08:00
ErvinXie	d8046e1bb4	Kt minimax (#1742 ) [feat]: fp8 kernel and kt-cli support	2025-12-24 15:39:44 +08:00
mrhaoxx	e7d277d163	[docs]: refine README for dpo updates (#1740 ) * [docs]: refine dpo tutorial * [docs]: refine README for dpo updates * Update doc/en/DPO_tutorial.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * [docs]: update website doc & refine location --------- Co-authored-by: ErvinXie <ervinxie@foxmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: ZiWei Yuan <yzwliam@126.com>	2025-12-24 11:20:08 +08:00
mrhaoxx	dee1e211d5	[docs]: refine dpo tutorial (#1739 )	2025-12-22 18:44:24 +08:00
Peilin Li	16d5d89f50	[docs]: Update Python version options in DPO tutorial (#1734 )	2025-12-20 13:44:35 +08:00
Peilin Li	df998e0f36	[docs]: Add RL-DPO Tutorial (#1733 )	2025-12-20 12:49:02 +08:00
Shaoxu Cheng	f25e58ad69	fix: qwen3-npu bugs; update: add readme-for-qwen3-npu (#1717 ) * fix: qwen3-npu bugs; update: add readme-for-qwen3-npu * fix: Correct the README description	2025-12-16 14:27:04 +08:00
RICHARDNAN	18fb8fc897	Npu revise benchmark results and prerequisites (#1716 ) * Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md * Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md * Revise Ascend NPU tutorial for Docker deployment Updated the tutorial for deploying the Ascend NPU, changing sections from 'Conda部署' to '镜像部署' and providing specific commands for Docker container setup and Python environment installation. * Update DeepseekR1 tutorial for Ascend NPU * Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md * Update W8A8 weight link in tutorial * Update doc/zh/DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Refactor Docker command and update package manager Updated Docker run command to simplify device specifications and corrected package manager command from 'apt' to 'yum'. * Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md * Revise benchmark results and prerequisites Updated performance results and hardware specifications. * Update doc/zh/DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-16 14:26:44 +08:00
RICHARDNAN	6431888928	add deploy in docker image (#1691 )	2025-12-11 14:11:27 +08:00
ErvinXie	eefc8cf98d	更新 Kimi-K2-Thinking-Native.md (#1684 )	2025-12-08 19:58:20 +08:00
Jiaqi Liao	f20e5d1da5	Revise prefill strategy and performance metrics (#1675 ) Updated the prefill strategy descriptions and performance benchmarks in the documentation.	2025-12-06 15:36:04 +08:00
Jiaqi Liao	1d62ac21f7	Update Kimi-K2-Thinking-Native.md (#1673 )	2025-12-05 23:08:02 +08:00
Jiaqi Liao	69fa7b1a57	Revise installation steps in Kimi-K2 documentation (#1672 ) Updated installation instructions and added steps for cloning the repository.	2025-12-05 23:05:24 +08:00
Jiaqi Liao	721b6c4c94	[docs] Update Native Kimi-K2-Thinking documentation and kt-kernel parameters (#1671 )	2025-12-05 22:46:16 +08:00
Jiaqi Liao	47da806cde	[doc](kt-kernel): add kimi-k2-thinking (#1670 )	2025-12-05 21:53:59 +08:00
ErvinXie	71f683acec	Support Native Kimi K2 Thinking (#1663 ) * [feat]: fix k2 prefill * Update Kimi-K2-Thinking.md * Create Kimi-K2-Thinking-Native.md * Update Kimi-K2-Thinking.md * Update Kimi-K2-Thinking.md * Update Kimi-K2-Thinking-Native.md * [perf] optimize K2 MoE weight loading with per-expert pointers - Avoid expensive torch.stack().contiguous() in Python (was ~6.6s) - Use per-expert pointer arrays (gate_projs) instead of contiguous memory - C++ worker pool performs parallel memcpy for TP slicing - Add LOAD_TIME_PROFILE for load_weights timing analysis 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: ouqingliang <1692110604@qq.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-12-05 21:53:05 +08:00
Jianwei Dong	670c488155	[docs]: Add deepseek-v3.2 run tutorial (#1659 )	2025-12-02 20:04:10 +08:00
Peilin Li	e637fedc65	[docs]: Add Full introduction of KT (#1636 )	2025-11-29 15:46:55 +08:00
RICHARDNAN	2cffdf7033	[docs]: Update DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md (#1638 )	2025-11-24 11:51:07 +08:00
Jiaqi Liao	46af8fcab5	[doc] fix kt parameters (#1629 )	2025-11-19 16:41:57 +08:00
Peilin Li	171578a7ec	[refactor]: Change named 'KT-SFT' to 'kt-sft' (#1626 ) * Change named 'KT-SFT' to 'kt-sft' * [docs]: update kt-sft name --------- Co-authored-by: ZiWei Yuan <yzwliam@126.com>	2025-11-17 11:48:42 +08:00
ZiWei Yuan	ab8ad0a110	[docs]: update web doc (#1625 )	2025-11-16 14:40:22 +08:00
ZiWei Yuan	be6db6f46b	[docs]: improve structure for kt-kernel (#1624 ) * [docs]: improve structure for kt-kernel * Update doc/en/kt-kernel/README.md	2025-11-16 13:21:41 +08:00
ZiWei Yuan	133eea037c	[docs]: improve docs structure (#1623 )	2025-11-16 12:40:59 +08:00
ZiWei Yuan	c2d2edbeef	[docs]: update the web docs structure (#1622 )	2025-11-16 12:09:44 +08:00
ZiWei Yuan	c32fefb1cd	[doc]: update web doc and kt-kernel doc (#1609 ) * [doc]: update web doc and kt-kernel doc * [doc](book.toml): add book.toml for rust book compile	2025-11-13 20:44:13 +08:00
Peilin Li	148a030026	upload hands-on tutorial with KTransformers-FT, especially in customize your KT-FT+LLaMA-Factory (#1597 ) * Add files via upload * upload hands-on tutorial for KTransformers-FT	2025-11-11 20:54:41 +08:00
Jiaqi Liao	57d14d22bc	Refactor: restructure repository to focus on kt-kernel and KT-SFT modulesq recon (#1581 ) * refactor: move legacy code to archive/ directory - Moved ktransformers, csrc, third_party, merge_tensors to archive/ - Moved build scripts and configurations to archive/ - Kept kt-kernel, KT-SFT, doc, and README files in root - Preserved complete git history for all moved files * refactor: restructure repository to focus on kt-kernel and KT-SFT modules * fix README * fix README * fix README * fix README * docs: add performance benchmarks to kt-kernel section Add comprehensive performance data for kt-kernel to match KT-SFT's presentation: - AMX kernel optimization: 21.3 TFLOPS (3.9× faster than PyTorch) - Prefill phase: up to 20× speedup vs baseline - Decode phase: up to 4× speedup - NUMA optimization: up to 63% throughput improvement - Multi-GPU (8×L20): 227.85 tokens/s total throughput with DeepSeek-R1 FP8 Source: https://lmsys.org/blog/2025-10-22-KTransformers/ This provides users with concrete performance metrics for both core modules, making it easier to understand the capabilities of each component. * refactor: improve kt-kernel performance data with specific hardware and models Replace generic performance descriptions with concrete benchmarks: - Specify exact hardware: 8×L20 GPU + Xeon Gold 6454S, Single/Dual-socket Xeon + AMX - Include specific models: DeepSeek-R1-0528 (FP8), DeepSeek-V3 (671B) - Show detailed metrics: total throughput, output throughput, concurrency details - Match KT-SFT presentation style for consistency This provides users with actionable performance data they can use to evaluate hardware requirements and expected performance for their use cases. * fix README * docs: clean up performance table and improve formatting * add pic for README * refactor: simplify .gitmodules and backup legacy submodules - Remove 7 legacy submodules from root .gitmodules (archive/third_party/) - Keep only 2 active submodules for kt-kernel (llama.cpp, pybind11) - Backup complete .gitmodules to archive/.gitmodules - Add documentation in archive/README.md for researchers who need legacy submodules This reduces initial clone size by ~500MB and avoids downloading unused dependencies. refactor: move doc/ back to root directory Keep documentation in root for easier access and maintenance. * refactor: consolidate all images to doc/assets/ - Move kt-kernel/assets/heterogeneous_computing.png to doc/assets/ - Remove KT-SFT/assets/ (images already in doc/assets/) - Update KT-SFT/README.md image references to ../doc/assets/ - Eliminates ~7.9MB image duplication - Centralizes all documentation assets in one location * fix pic path for README	2025-11-10 17:42:26 +08:00

1 2 3 4 5

242 Commits