ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2026-03-14 18:37:23 +00:00

Author	SHA1	Message	Date
Jianwei Dong	15c624dcae	Fix/sglang kt detection (#1875 ) * [feat]: simplify sglang installation with submodule, auto-sync CI, and version alignment - Add kvcache-ai/sglang as git submodule at third_party/sglang (branch = main) - Add top-level install.sh for one-click source installation (sglang + kt-kernel) - Add sglang-kt as hard dependency in kt-kernel/pyproject.toml - Add CI workflow to auto-sync sglang submodule daily and create PR - Add CI workflow to build and publish sglang-kt to PyPI - Integrate sglang-kt build into release-pypi.yml (version.py bump publishes both packages) - Align sglang-kt version with ktransformers via SGLANG_KT_VERSION env var injection - Update Dockerfile to use submodule and inject aligned version - Update all 13 doc files, CLI hints, and i18n strings to reference new install methods Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [build]: bump version to 0.5.2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [build]: rename PyPI package from kt-kernel to ktransformers Users can now `pip install ktransformers` to get everything (sglang-kt is auto-installed as a dependency). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert "[build]: rename PyPI package from kt-kernel to ktransformers" This reverts commit `e0cbbf6364`. * [build]: add ktransformers meta-package for PyPI `pip install ktransformers` now works as a single install command. It pulls kt-kernel (which in turn pulls sglang-kt). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [fix]: show sglang-kt package version in kt version command - Prioritize sglang-kt package version (aligned with ktransformers) over sglang internal __version__ - Update display name from "sglang" to "sglang-kt" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [fix]: improve sglang-kt detection in kt doctor and kt version Recognize sglang-kt package name as proof of kvcache-ai fork installation. Previously both commands fell through to "PyPI (not recommended)" for non-editable local source installs. Now version.py reuses the centralized check_sglang_installation() logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [build]: bump version to 0.5.2.post1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 16:54:48 +08:00
Jiaqi Liao	57d14d22bc	Refactor: restructure repository to focus on kt-kernel and KT-SFT modulesq recon (#1581 ) * refactor: move legacy code to archive/ directory - Moved ktransformers, csrc, third_party, merge_tensors to archive/ - Moved build scripts and configurations to archive/ - Kept kt-kernel, KT-SFT, doc, and README files in root - Preserved complete git history for all moved files * refactor: restructure repository to focus on kt-kernel and KT-SFT modules * fix README * fix README * fix README * fix README * docs: add performance benchmarks to kt-kernel section Add comprehensive performance data for kt-kernel to match KT-SFT's presentation: - AMX kernel optimization: 21.3 TFLOPS (3.9× faster than PyTorch) - Prefill phase: up to 20× speedup vs baseline - Decode phase: up to 4× speedup - NUMA optimization: up to 63% throughput improvement - Multi-GPU (8×L20): 227.85 tokens/s total throughput with DeepSeek-R1 FP8 Source: https://lmsys.org/blog/2025-10-22-KTransformers/ This provides users with concrete performance metrics for both core modules, making it easier to understand the capabilities of each component. * refactor: improve kt-kernel performance data with specific hardware and models Replace generic performance descriptions with concrete benchmarks: - Specify exact hardware: 8×L20 GPU + Xeon Gold 6454S, Single/Dual-socket Xeon + AMX - Include specific models: DeepSeek-R1-0528 (FP8), DeepSeek-V3 (671B) - Show detailed metrics: total throughput, output throughput, concurrency details - Match KT-SFT presentation style for consistency This provides users with actionable performance data they can use to evaluate hardware requirements and expected performance for their use cases. * fix README * docs: clean up performance table and improve formatting * add pic for README * refactor: simplify .gitmodules and backup legacy submodules - Remove 7 legacy submodules from root .gitmodules (archive/third_party/) - Keep only 2 active submodules for kt-kernel (llama.cpp, pybind11) - Backup complete .gitmodules to archive/.gitmodules - Add documentation in archive/README.md for researchers who need legacy submodules This reduces initial clone size by ~500MB and avoids downloading unused dependencies. refactor: move doc/ back to root directory Keep documentation in root for easier access and maintenance. * refactor: consolidate all images to doc/assets/ - Move kt-kernel/assets/heterogeneous_computing.png to doc/assets/ - Remove KT-SFT/assets/ (images already in doc/assets/) - Update KT-SFT/README.md image references to ../doc/assets/ - Eliminates ~7.9MB image duplication - Centralizes all documentation assets in one location * fix pic path for README	2025-11-10 17:42:26 +08:00
RICHARDNAN	c89959fe1d	Update setup.py	2025-09-08 17:18:57 +08:00
RICHARDNAN	a0229b220c	同步0.2.4npu的scripts	2025-09-03 15:40:28 +08:00
Alisehen	f3b1e36b6a	bug fix	2025-05-15 10:01:51 +00:00
rnwang04	2f6e14a54b	fix md typo, fix code style, and update setup value error message	2025-05-15 10:14:39 +00:00
rnwang04	142fb7ce6c	Enable support for Intel XPU devices, add support for DeepSeek V2/V3 first	2025-05-14 19:37:27 +00:00
jzl	9a759e9fb8	fix: make cpufeature a local import	2025-04-25 11:42:38 +08:00
jizhilong	0638ea298d	feat(build): display limited tail of subprocesses in real time this is a followup on #1108	2025-04-15 16:40:38 +08:00
jizhilong	690d4d42f9	chore: show cmake output in real time during build_ext otherwise cmake error messages may be suppressed, making debugging difficult	2025-04-10 21:33:04 +08:00
Atream	9dd24ecd72	fix compile, add abi check to setup.py	2025-04-08 06:18:30 +00:00
Atream	ec12429c46	Merge pull request #1005 from fishingfly/improve/backend-error-msg fix: refine backend error message to include ROCM_HOME	2025-04-02 14:54:23 +08:00
fishingfly	7549ff335a	fix: refine backend error message to include ROCM_HOME Signed-off-by: fishingfly <zhoyuzf@163.com>	2025-04-01 10:50:38 +08:00
Atream	25cee5810e	add balance-serve, support concurrence	2025-03-31 22:55:32 +08:00
Azure-Tang	ed8437413b	merge main; Add torch q8 linear	2025-03-14 05:52:07 -04:00
Xiaodong Ye	18b1d18367	musa: support bf16 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-02-23 10:19:19 +08:00
Atream	94ab2de3b9	Merge pull request #523 from miaooo0000OOOO/main optimize CMake multi core parallel	2025-02-22 17:38:18 +08:00
Azure	25c5bddd08	Merge pull request #506 from makllama/musa feat: Support Moore Threads GPU	2025-02-20 22:50:31 +08:00
miaooo0000OOOO	0cdc446a19	默认并行编译	2025-02-20 11:09:16 +08:00
Xiaodong Ye	2207f6cd14	feat: Support Moore Threads GPU Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-02-19 18:26:55 +08:00
hoshinohikari	6d9182795c	Fix cmake error caused by lack of environment variables in Windows environment	2025-02-17 15:56:56 +08:00
liam	098602b08f	⚡ v0.2 ongoing	2025-02-09 22:41:14 +08:00
chenxl	1db4a67dca	[feature] add github action for pre compile	2024-08-14 16:54:50 +00:00
chenxl	f5f79f5c0e	[ADD] support multi-gpu qlen>1 q5_k	2024-08-12 11:41:26 +00:00
chenxl	112cb3c962	[feature] support python 310 and multi instruction	2024-07-31 13:58:17 +00:00
chenxl	dd18a11cab	[feature] support for pypi install	2024-07-29 11:51:28 +00:00
chenxl	18c42e67df	Initial commit	2024-07-27 16:06:58 +08:00

27 Commits