10 Commits

Author SHA1 Message Date
acture
9b2d3b687b fix: remove broken symlink in archive/ktransformers/ (#1906)
The symlink `archive/ktransformers/ktransformers` points to
`/home/djw/py311_717/ktransformers/ktransformers`, an absolute path
on a developer's local machine. It was introduced in #1581 during the
repository restructuring and is broken on every other machine.

Tools that recursively copy the repo tree (e.g. shutil.copytree)
fail with FileNotFoundError on this dangling link.
2026-04-09 11:42:19 +08:00
Shaoxu Cheng
f25e58ad69 fix: qwen3-npu bugs; update: add readme-for-qwen3-npu (#1717)
* fix: qwen3-npu bugs; update: add readme-for-qwen3-npu

* fix: Correct the README description
2025-12-16 14:27:04 +08:00
Shaoxu Cheng
1e69563363 update: add cache class and ascend ln mlp op for qwen3 adapt npu (#1708) 2025-12-11 17:08:35 +08:00
Shaoxu Cheng
cea490a326 update: add ascend attn and experts ops for npu qwen3moe adapt (#1707)
* update: add ascend attn and experts ops for npu qwen3moe adapt

* Reorder import statements in custom_ascend_modelling_qwen3.py

* Restore copyright and import statements

Restored copyright information and imports in ascend_experts.py.
2025-12-11 17:08:15 +08:00
Shaoxu Cheng
adcfa9080f update: Qwen3 MoE model adaptation for NPU (framework) (#1706) 2025-12-11 17:07:57 +08:00
Shaoxu Cheng
8995378a91 update: add attention and ln ut for npu (#1698) 2025-12-10 16:12:26 +08:00
Peilin Li
171578a7ec [refactor]: Change named 'KT-SFT' to 'kt-sft' (#1626)
* Change named 'KT-SFT' to 'kt-sft'

* [docs]: update kt-sft name

---------

Co-authored-by: ZiWei Yuan <yzwliam@126.com>
2025-11-17 11:48:42 +08:00
Jiaqi Liao
956d19d2d8 Fix git submodule (#1586)
* refactor repo

* fix README

* fix gitsubmodule
2025-11-10 19:10:13 +08:00
Jiaqi Liao
07322ca2bd Refactor: restructure repository to focus on kt-kernel and KT-SFT modules (#1583)
* refactor repo

* fix README
2025-11-10 17:57:48 +08:00
Jiaqi Liao
57d14d22bc Refactor: restructure repository to focus on kt-kernel and KT-SFT modulesq recon (#1581)
* refactor: move legacy code to archive/ directory

  - Moved ktransformers, csrc, third_party, merge_tensors to archive/
  - Moved build scripts and configurations to archive/
  - Kept kt-kernel, KT-SFT, doc, and README files in root
  - Preserved complete git history for all moved files

* refactor: restructure repository to focus on kt-kernel and KT-SFT modules

* fix README

* fix README

* fix README

* fix README

* docs: add performance benchmarks to kt-kernel section

Add comprehensive performance data for kt-kernel to match KT-SFT's presentation:
- AMX kernel optimization: 21.3 TFLOPS (3.9× faster than PyTorch)
- Prefill phase: up to 20× speedup vs baseline
- Decode phase: up to 4× speedup
- NUMA optimization: up to 63% throughput improvement
- Multi-GPU (8×L20): 227.85 tokens/s total throughput with DeepSeek-R1 FP8

Source: https://lmsys.org/blog/2025-10-22-KTransformers/

This provides users with concrete performance metrics for both core modules,
making it easier to understand the capabilities of each component.

* refactor: improve kt-kernel performance data with specific hardware and models

Replace generic performance descriptions with concrete benchmarks:
- Specify exact hardware: 8×L20 GPU + Xeon Gold 6454S, Single/Dual-socket Xeon + AMX
- Include specific models: DeepSeek-R1-0528 (FP8), DeepSeek-V3 (671B)
- Show detailed metrics: total throughput, output throughput, concurrency details
- Match KT-SFT presentation style for consistency

This provides users with actionable performance data they can use to evaluate
hardware requirements and expected performance for their use cases.

* fix README

* docs: clean up performance table and improve formatting

* add pic for README

* refactor: simplify .gitmodules and backup legacy submodules

- Remove 7 legacy submodules from root .gitmodules (archive/third_party/*)
- Keep only 2 active submodules for kt-kernel (llama.cpp, pybind11)
- Backup complete .gitmodules to archive/.gitmodules
- Add documentation in archive/README.md for researchers who need legacy submodules

This reduces initial clone size by ~500MB and avoids downloading unused dependencies.

* refactor: move doc/ back to root directory

Keep documentation in root for easier access and maintenance.

* refactor: consolidate all images to doc/assets/

- Move kt-kernel/assets/heterogeneous_computing.png to doc/assets/
- Remove KT-SFT/assets/ (images already in doc/assets/)
- Update KT-SFT/README.md image references to ../doc/assets/
- Eliminates ~7.9MB image duplication
- Centralizes all documentation assets in one location

* fix pic path for README
2025-11-10 17:42:26 +08:00