ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2026-03-15 02:47:22 +00:00

Author	SHA1	Message	Date
Peilin Li	171578a7ec	[refactor]: Change named 'KT-SFT' to 'kt-sft' (#1626 ) * Change named 'KT-SFT' to 'kt-sft' * [docs]: update kt-sft name --------- Co-authored-by: ZiWei Yuan <yzwliam@126.com>	2025-11-17 11:48:42 +08:00
ZiWei Yuan	550e4986f5	[docs]: update README.md (#1616 ) * [docs]: update README.md	2025-11-15 20:56:26 +08:00
ZiWei Yuan	7c2ad6dbca	[docs]: update README.md (#1614 ) * [docs]: update README.md	2025-11-15 18:34:27 +08:00
ErvinXie	5179f0d634	Add roadmap link to README (#1585 )	2025-11-10 18:15:53 +08:00
Jiaqi Liao	07322ca2bd	Refactor: restructure repository to focus on kt-kernel and KT-SFT modules (#1583 ) * refactor repo * fix README	2025-11-10 17:57:48 +08:00
ErvinXie	2cb1674020	Fix image reference in README.md (#1584 ) Updated image reference in README for heterogeneous computing.	2025-11-10 17:53:41 +08:00
Jiaqi Liao	57d14d22bc	Refactor: restructure repository to focus on kt-kernel and KT-SFT modulesq recon (#1581 ) * refactor: move legacy code to archive/ directory - Moved ktransformers, csrc, third_party, merge_tensors to archive/ - Moved build scripts and configurations to archive/ - Kept kt-kernel, KT-SFT, doc, and README files in root - Preserved complete git history for all moved files * refactor: restructure repository to focus on kt-kernel and KT-SFT modules * fix README * fix README * fix README * fix README * docs: add performance benchmarks to kt-kernel section Add comprehensive performance data for kt-kernel to match KT-SFT's presentation: - AMX kernel optimization: 21.3 TFLOPS (3.9× faster than PyTorch) - Prefill phase: up to 20× speedup vs baseline - Decode phase: up to 4× speedup - NUMA optimization: up to 63% throughput improvement - Multi-GPU (8×L20): 227.85 tokens/s total throughput with DeepSeek-R1 FP8 Source: https://lmsys.org/blog/2025-10-22-KTransformers/ This provides users with concrete performance metrics for both core modules, making it easier to understand the capabilities of each component. * refactor: improve kt-kernel performance data with specific hardware and models Replace generic performance descriptions with concrete benchmarks: - Specify exact hardware: 8×L20 GPU + Xeon Gold 6454S, Single/Dual-socket Xeon + AMX - Include specific models: DeepSeek-R1-0528 (FP8), DeepSeek-V3 (671B) - Show detailed metrics: total throughput, output throughput, concurrency details - Match KT-SFT presentation style for consistency This provides users with actionable performance data they can use to evaluate hardware requirements and expected performance for their use cases. * fix README * docs: clean up performance table and improve formatting * add pic for README * refactor: simplify .gitmodules and backup legacy submodules - Remove 7 legacy submodules from root .gitmodules (archive/third_party/) - Keep only 2 active submodules for kt-kernel (llama.cpp, pybind11) - Backup complete .gitmodules to archive/.gitmodules - Add documentation in archive/README.md for researchers who need legacy submodules This reduces initial clone size by ~500MB and avoids downloading unused dependencies. refactor: move doc/ back to root directory Keep documentation in root for easier access and maintenance. * refactor: consolidate all images to doc/assets/ - Move kt-kernel/assets/heterogeneous_computing.png to doc/assets/ - Remove KT-SFT/assets/ (images already in doc/assets/) - Update KT-SFT/README.md image references to ../doc/assets/ - Eliminates ~7.9MB image duplication - Centralizes all documentation assets in one location * fix pic path for README	2025-11-10 17:42:26 +08:00
Atream	86229c852d	Add update for Kimi-K2-Thinking support	2025-11-06 17:56:46 +08:00
ovowei	44e47ad75a	update readme.md	2025-11-05 23:30:58 +08:00
ovowei	00f038e763	update readme.md	2025-11-05 23:29:59 +08:00
ovowei	1e17d75bfd	fix	2025-10-30 10:47:05 +08:00
ovowei	ca21992e46	update readme.md. (Support Ascend NPU)	2025-10-27 20:53:06 +08:00
Atream	8ef6111ae0	Update README with Citation link	2025-10-10 19:12:31 +08:00
Atream	1e48eab7d5	Add citation section to README Added citation section with reference to KTransformers paper.	2025-10-10 18:59:29 +08:00
Atream	e93abc93ec	Add SGLang Integration to README.md	2025-10-10 18:50:05 +08:00
Jianwei Dong	d4b3fe2427	Merge branch 'main' into support-qwen3next	2025-09-12 21:59:32 +08:00
djw	a44b710649	support qwen3 next	2025-09-11 11:55:09 +00:00
Azure	24fe61bbc3	Update date for Kimi-K2-0905 support	2025-09-05 17:47:17 +08:00
Azure-Tang	b6d36bffbb	update kimi-k2-0905	2025-09-05 03:52:43 +00:00
qiyuxinlin	1334ddc833	update readme	2025-07-25 17:02:36 +00:00
Atream	cf79c93fae	Update README.md	2025-07-11 09:35:12 +08:00
Atream	18690d819f	Update README.md	2025-07-11 09:34:07 +08:00
ErvinXie	aadf31b35d	Update README.md	2025-06-30 17:55:49 +08:00
ErvinXie	a9a72e52c3	Update README.md	2025-06-30 14:56:46 +08:00
liam Yuan	22d0d9ccb2	✨ update vendor ZTE name	2025-06-23 21:07:17 +08:00
liam Yuan	cb77b52c63	✨ update vendor support list	2025-06-23 21:00:01 +08:00
Atream	d051a14941	Update README.md	2025-05-15 10:29:43 +08:00
rnwang04	142fb7ce6c	Enable support for Intel XPU devices, add support for DeepSeek V2/V3 first	2025-05-14 19:37:27 +00:00
Atream	7ebf82a492	Update Qwen3 date	2025-04-29 09:43:13 +08:00
qiyuxinlin	a3ba63665a	update readme	2025-04-28 22:38:41 +00:00
qiyuxinlin	89823ccb1f	update readme	2025-04-28 22:34:47 +00:00
qiyuxinlin	e7763a4b59	update readme	2025-04-28 22:32:35 +00:00
qiyuxinlin	d3ebdafd4b	update readme	2025-04-28 22:31:09 +00:00
qiyuxinlin	59b0631e33	update readme	2025-04-28 22:26:38 +00:00
qiyuxinlin	8f76c37d86	fix readme	2025-04-28 22:17:22 +00:00
qiyuxinlin	cb5617b479	update readme	2025-04-28 22:14:23 +00:00
djw	26798500bd	update llama4 tutorial	2025-04-09 09:40:08 +00:00
djw	f73b4ca706	update llama4 tutorial	2025-04-09 09:36:30 +00:00
dongjw	4ed9744ebb	update readme	2025-04-02 14:02:57 +08:00
dongjw	b62cefaec9	update readme	2025-04-02 13:11:01 +08:00
Azure-Tang	3a5330b215	Merge branch 'main' into work-concurrent	2025-04-01 06:48:19 +00:00
Atream	25cee5810e	add balance-serve, support concurrence	2025-03-31 22:55:32 +08:00
Atream	d4c6c2bb02	Update README.md	2025-03-22 12:14:36 +08:00
liam	4748a912e2	📝 fix typo ktransformer->ktransformers	2025-03-17 17:54:00 +08:00
Azure-Tang	e5b001d76f	Update readme; Format code; Add example yaml.	2025-03-14 14:25:52 -04:00
Azure	034a116365	update readme	2025-03-05 10:04:43 +00:00
Azure	91c1619296	Merge branch 'develop-0.2.2' into support-fp8 Update README.md	2025-02-25 13:43:26 +00:00
Azure	36fbeee341	Update doc	2025-02-25 08:21:18 +00:00
_	5ed441a0f5	Update README.md	2025-02-21 14:15:50 +00:00
liam	13382f88ab	📝 update V0.2.1 Doc	2025-02-15 16:17:05 +08:00

1 2

100 Commits