diff --git a/.gitmodules b/.gitmodules
index 6c378f9..7f6e818 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,25 +1,3 @@
-[submodule "third_party/llama.cpp"]
- path = third_party/llama.cpp
- url = https://github.com/ggerganov/llama.cpp.git
-[submodule "third_party/pybind11"]
- path = third_party/pybind11
- url = https://github.com/pybind/pybind11.git
-[submodule "third_party/spdlog"]
- path = third_party/spdlog
- url = https://github.com/gabime/spdlog.git
-[submodule "third_party/custom_flashinfer"]
- path = third_party/custom_flashinfer
- url = https://github.com/kvcache-ai/custom_flashinfer.git
- branch = fix-precision-mla-merge-main
-[submodule "third_party/xxHash"]
- path = third_party/xxHash
- url = https://github.com/Cyan4973/xxHash.git
-[submodule "third_party/prometheus-cpp"]
- path = third_party/prometheus-cpp
- url = https://github.com/jupp0r/prometheus-cpp
-[submodule "third_party/PhotonLibOS"]
- path = third_party/PhotonLibOS
- url = https://github.com/alibaba/PhotonLibOS.git
[submodule "kt-kernel/third_party/llama.cpp"]
path = kt-kernel/third_party/llama.cpp
url = https://github.com/ggerganov/llama.cpp.git
diff --git a/README.md b/README.md
index 92ba915..56cb4d8 100644
--- a/README.md
+++ b/README.md
@@ -1,217 +1,136 @@
-🎉 Introduction
-KTransformers, pronounced as Quick Transformers, is designed to enhance your 🤗 Transformers experience with advanced kernel optimizations and placement/parallelism strategies.
-
-KTransformers is a flexible, Python-centric framework designed with extensibility at its core.
-By implementing and injecting an optimized module with a single line of code, users gain access to a Transformers-compatible
-interface, RESTful APIs compliant with OpenAI and Ollama, and even a simplified ChatGPT-like web UI.
-
-Our vision for KTransformers is to serve as a flexible platform for experimenting with innovative LLM inference optimizations. Please let us know if you need any other features.
+## 🎯 Overview
-🔥 Updates
+KTransformers is a research project focused on efficient inference and fine-tuning of large language models through CPU-GPU heterogeneous computing. The project has evolved into **two core modules**: [kt-kernel](./kt-kernel/) and [KT-SFT](./KT-SFT/).
-* **Nov 6, 2025**: Support Kimi-K2-Thinking inference ([Tutorial](./doc/en/Kimi-K2-Thinking.md)) and fine-tune ([Tutorial](./doc/en/SFT_Installation_Guide_KimiK2.md))
-* **Nov 4, 2025**: KTransformers Fine-Tuning × LLaMA-Factory Integration. ([Tutorial](./doc/en/KTransformers-Fine-Tuning_User-Guide.md))
-* **Oct 27, 2025**: Support Ascend NPU. ([Tutorial](./doc/zh/DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md))
-* **Oct 10, 2025**: Integrating into SGLang. ([Roadmap](https://github.com/sgl-project/sglang/issues/11425))
-* **Sept 11, 2025**: Support Qwen3-Next. ([Tutorial](./doc/en/Qwen3-Next.md))
-* **Sept 05, 2025**: Support Kimi-K2-0905. ([Tutorial](./doc/en/Kimi-K2.md))
-* **July 26, 2025**: Support SmallThinker and GLM4-MoE. ([Tutorial](./doc/en/SmallThinker_and_Glm4moe.md))
-* **July 11, 2025**: Support Kimi-K2. ([Tutorial](./doc/en/Kimi-K2.md))
-* **June 30, 2025**: Support 3-layer (GPU-CPU-Disk) [prefix cache](./doc/en/prefix_cache.md) reuse.
-* **May 14, 2025**: Support Intel Arc GPU ([Tutorial](./doc/en/xpu.md)).
-* **Apr 29, 2025**: Support AMX-Int8、 AMX-BF16 and Qwen3MoE ([Tutorial](./doc/en/AMX.md))
+## 🔥 Updates
-https://github.com/user-attachments/assets/fafe8aec-4e22-49a8-8553-59fb5c6b00a2
+* **Nov 6, 2025**: Support Kimi-K2-Thinking inference and fine-tune
+* **Nov 4, 2025**: KTransformers Fine-Tuning × LLaMA-Factory Integration
+* **Oct 27, 2025**: Support Ascend NPU
+* **Oct 10, 2025**: Integrating into SGLang ([Roadmap](https://github.com/sgl-project/sglang/issues/11425), [Blog](https://lmsys.org/blog/2025-10-22-KTransformers/))
+* **Sept 11, 2025**: Support Qwen3-Next
+* **Sept 05, 2025**: Support Kimi-K2-0905
+* **July 26, 2025**: Support SmallThinker and GLM4-MoE
+* **June 30, 2025**: Support 3-layer (GPU-CPU-Disk) prefix cache reuse
+* **May 14, 2025**: Support Intel Arc GPU
+* **Apr 29, 2025**: Support AMX-Int8、AMX-BF16 and Qwen3MoE
+* **Apr 9, 2025**: Experimental support for LLaMA 4 models
+* **Apr 2, 2025**: Support Multi-concurrency
+* **Mar 15, 2025**: Support ROCm on AMD GPU
+* **Mar 5, 2025**: Support unsloth 1.58/2.51 bits weights and IQ1_S/FP8 hybrid weights; 139K longer context for DeepSeek-V3/R1
+* **Feb 25, 2025**: Support FP8 GPU kernel for DeepSeek-V3 and R1
+* **Feb 10, 2025**: Support Deepseek-R1 and V3, up to 3~28x speedup
-* **Apr 9, 2025**: Experimental support for LLaMA 4 models ([Tutorial](./doc/en/llama4.md)).
-* **Apr 2, 2025**: Support Multi-concurrency. ([Tutorial](./doc/en/balance-serve.md)).
+---
-https://github.com/user-attachments/assets/faa3bda2-928b-45a7-b44f-21e12ec84b8a
+## 📦 Core Modules
-* **Mar 15, 2025**: Support ROCm on AMD GPU ([Tutorial](./doc/en/ROCm.md)).
-* **Mar 5, 2025**: Support unsloth 1.58/2.51 bits weights and [IQ1_S/FP8 hybrid](./doc/en/fp8_kernel.md) weights. Support 139K [Longer Context](./doc/en/DeepseekR1_V3_tutorial.md#v022--v023-longer-context--fp8-kernel) for DeepSeek-V3 and R1 in 24GB VRAM.
-* **Feb 25, 2025**: Support [FP8 GPU kernel](./doc/en/fp8_kernel.md) for DeepSeek-V3 and R1; [Longer Context](./doc/en/DeepseekR1_V3_tutorial.md#v022-longer-context).
-* **Feb 15, 2025**: Longer Context (from 4K to 8K for 24GB VRAM) & Slightly Faster Speed (+15%, up to 16 Tokens/s), update [docs](./doc/en/DeepseekR1_V3_tutorial.md) and [online books](https://kvcache-ai.github.io/ktransformers/).
-* **Feb 10, 2025**: Support Deepseek-R1 and V3 on single (24GB VRAM)/multi gpu and 382G DRAM, up to 3~28x speedup. For detailed show case and reproduction tutorial, see [here](./doc/en/DeepseekR1_V3_tutorial.md).
-* **Aug 28, 2024**: Decrease DeepseekV2's required VRAM from 21G to 11G.
-* **Aug 15, 2024**: Update detailed [tutorial](doc/en/injection_tutorial.md) for injection and multi-GPU.
-* **Aug 14, 2024**: Support llamfile as linear backend.
-* **Aug 12, 2024**: Support multiple GPU; Support new model: mixtral 8\*7B and 8\*22B; Support q2k, q3k, q5k dequant on gpu.
-* **Aug 9, 2024**: Support windows native.
+### 🚀 [kt-kernel](./kt-kernel/) - High-Performance Inference Kernels
-
+CPU-optimized kernel operations for heterogeneous LLM inference.
-🌟 Show Cases
+
-
-
GPT-4/o1-level Local VSCode Copilot on a Desktop with only 24GB VRAM
-
+**Key Features:**
+- **AMX/AVX Acceleration**: Intel AMX and AVX512/AVX2 optimized kernels for INT4/INT8 quantized inference
+- **MoE Optimization**: Efficient Mixture-of-Experts inference with NUMA-aware memory management
+- **Quantization Support**: CPU-side INT4/INT8 quantized weights, GPU-side GPTQ support
+- **Easy Integration**: Clean Python API for SGLang and other frameworks
-https://github.com/user-attachments/assets/ebd70bfa-b2c1-4abb-ae3b-296ed38aa285
-
-
-
-- **[NEW!!!] Local 671B DeepSeek-Coder-V3/R1:** Running its Q4_K_M version using only 14GB VRAM and 382GB DRAM([Tutorial](./doc/en/DeepseekR1_V3_tutorial.md)).
-
- - Prefill Speed (tokens/s):
- - KTransformers: 54.21 (32 cores) → 74.362 (dual-socket, 2×32 cores) → 255.26 (optimized AMX-based MoE kernel, V0.3 only) → 286.55 (selectively using 6 experts, V0.3 only)
- - Compared to 10.31 tokens/s in llama.cpp with 2×32 cores, achieving up to **27.79× speedup**.
- - Decode Speed (tokens/s):
- - KTransformers: 8.73 (32 cores) → 11.26 (dual-socket, 2×32 cores) → 13.69 (selectively using 6 experts, V0.3 only)
- - Compared to 4.51 tokens/s in llama.cpp with 2×32 cores, achieving up to **3.03× speedup**.
- - Upcoming Open Source Release:
- - AMX optimizations and selective expert activation will be open-sourced in V0.3.
- - Currently available only in preview binary distribution, which can be downloaded [here](./doc/en/DeepseekR1_V3_tutorial.md).
-- **Local 236B DeepSeek-Coder-V2:** Running its Q4_K_M version using only 21GB VRAM and 136GB DRAM, attainable on a local desktop machine, which scores even better than GPT4-0613 in [BigCodeBench](https://huggingface.co/blog/leaderboard-bigcodebench).
-
-
-
-
-
-
-
-- **Faster Speed:** Achieving 126 tokens/s for 2K prompt prefill and 13.6 tokens/s for generation through MoE offloading and injecting advanced kernels from [Llamafile](https://github.com/Mozilla-Ocho/llamafile/tree/main) and [Marlin](https://github.com/IST-DASLab/marlin).
-- **VSCode Integration:** Wrapped into an OpenAI and Ollama compatible API for seamless integration as a backend for [Tabby](https://github.com/TabbyML/tabby) and various other frontends.
-
-
-
-https://github.com/user-attachments/assets/4c6a8a38-05aa-497d-8eb1-3a5b3918429c
-
-
-
-
-
-More advanced features will coming soon, so stay tuned!
-
-🚀 Quick Start
-
-Getting started with KTransformers is simple! Follow the steps below to set up and start using it.
-
-we have already supported vendors:
-
-- Metax
-- Sanechips (ZhuFeng V1.0)
-- Intel
-- Ascend
-- Kunpeng
-- AMD
-
-### 📥 Installation
-
-To install KTransformers, follow the official [Installation Guide](https://kvcache-ai.github.io/ktransformers/en/install.html).
-
-📃 Brief Injection Tutorial
-At the heart of KTransformers is a user-friendly, template-based injection framework.
-This allows researchers to easily replace original torch modules with optimized variants. It also simplifies the process of combining multiple optimizations, allowing the exploration of their synergistic effects.
-
-
-
-
-
-
-
-
-Given that vLLM already serves as a great framework for large-scale deployment optimizations, KTransformers is particularly focused on local deployments that are constrained by limited resources. We pay special attention to heterogeneous computing opportunities, such as GPU/CPU offloading of quantized models. For example, we support the efficient Llamafile and Marlin kernels for CPU and GPU, respectively. More details can be found here.
-
-Example Usage
-To utilize the provided kernels, users only need to create a YAML-based injection template and add the call to `optimize_and_load_gguf` before using the Transformers model.
-
-```python
-with torch.device("meta"):
- model = AutoModelForCausalLM.from_config(config, trust_remote_code=True)
-optimize_and_load_gguf(model, optimize_config_path, gguf_path, config)
-...
-generated = prefill_and_generate(model, tokenizer, input_tensor.cuda(), max_new_tokens=1000)
+**Quick Start:**
+```bash
+cd kt-kernel
+pip install .
```
-In this example, the AutoModel is first initialized on the meta device to avoid occupying any memory resources. Then, `optimize_and_load_gguf` iterates through all sub-modules of the model, matches rules specified in your YAML rule file, and replaces them with advanced modules as specified.
+**Use Cases:**
-After injection, the original `generate` interface is available, but we also provide a compatible `prefill_and_generate` method, which enables further optimizations like CUDAGraph to improve generation speed.
+- CPU-GPU hybrid inference for large MoE models
+- Integration with SGLang for production serving
+- Heterogeneous expert placement (hot experts on GPU, cold experts on CPU)
-How to custom your model
+**Performance Examples:**
+| Model | Hardware Configuration | Total Throughput | Output Throughput |
+|-------|------------------------|------------------|-------------------|
+| DeepSeek-R1-0528 (FP8) | 8×L20 GPU + Xeon Gold 6454S | 227.85 tokens/s | 87.58 tokens/s (8-way concurrency) |
-A detailed tutorial of the injection and multi-GPU using DeepSeek-V2 as an example is given [here](doc/en/injection_tutorial.md).
+👉 **[Full Documentation →](./kt-kernel/README.md)**
-Below is an example of a YAML template for replacing all original Linear modules with Marlin, an advanced 4-bit quantization kernel.
+---
-```yaml
-- match:
- name: "^model\\.layers\\..*$" # regular expression
- class: torch.nn.Linear # only match modules matching name and class simultaneously
- replace:
- class: ktransformers.operators.linear.KTransformerLinear # optimized Kernel on quantized data types
- device: "cpu" # which devices to load this module when initializing
- kwargs:
- generate_device: "cuda"
- generate_linear_type: "QuantizedLinearMarlin"
+### 🎓 [KT-SFT](./KT-SFT/) - Fine-Tuning Framework
+
+KTransformers × LLaMA-Factory integration for ultra-large MoE model fine-tuning.
+
+
+
+**Key Features:**
+
+- **Resource Efficient**: Fine-tune 671B DeepSeek-V3 with just **70GB GPU memory** + 1.3TB RAM
+- **LoRA Support**: Full LoRA fine-tuning with heterogeneous acceleration
+- **LLaMA-Factory Integration**: Seamless integration with popular fine-tuning framework
+- **Production Ready**: Chat, batch inference, and metrics evaluation
+
+**Performance Examples:**
+
+| Model | Configuration | Throughput | GPU Memory |
+|-------|--------------|------------|------------|
+| DeepSeek-V3 (671B) | LoRA + AMX | ~40 tokens/s | 70GB (multi-GPU) |
+| DeepSeek-V2-Lite (14B) | LoRA + AMX | ~530 tokens/s | 6GB |
+
+**Quick Start:**
+```bash
+cd KT-SFT
+# Install environment following KT-SFT/README.md
+USE_KT=1 llamafactory-cli train examples/train_lora/deepseek3_lora_sft_kt.yaml
```
-Each rule in the YAML file has two parts: `match` and `replace`. The `match` part specifies which module should be replaced, and the `replace` part specifies the module to be injected into the model along with the initialization keywords.
+👉 **[Full Documentation →](./KT-SFT/README.md)**
-You can find example rule templates for optimizing DeepSeek-V2 and Qwen2-57B-A14, two SOTA MoE models, in the [ktransformers/optimize/optimize_rules](ktransformers/optimize/optimize_rules) directory. These templates are used to power the `local_chat.py` demo.
+---
-If you are interested in our design principles and the implementation of the injection framework, please refer to the [design document](doc/en/deepseek-v2-injection.md).
+## 🔥 Citation
-🔥 Citation
+If you use KTransformers in your research, please cite our paper:
-If you use KTransformers for your research, please cite our [paper](https://madsys.cs.tsinghua.edu.cn/publication/ktransformers-unleashing-the-full-potential-of-cpu/gpu-hybrid-inference-for-moe-models/):
-
-```
+```bibtex
@inproceedings{10.1145/3731569.3764843,
-title = {KTransformers: Unleashing the Full Potential of CPU/GPU Hybrid Inference for MoE Models},
-author = {Chen, Hongtao and Xie, Weiyu and Zhang, Boxin and Tang, Jingqi and Wang, Jiahao and Dong, Jianwei and Chen, Shaoyuan and Yuan, Ziwei and Lin, Chen and Qiu, Chengyu and Zhu, Yuening and Ou, Qingliang and Liao, Jiaqi and Chen, Xianglin and Ai, Zhiyuan and Wu, Yongwei and Zhang, Mingxing},
-booktitle = {Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles},
-year = {2025}
+ title = {KTransformers: Unleashing the Full Potential of CPU/GPU Hybrid Inference for MoE Models},
+ author = {Chen, Hongtao and Xie, Weiyu and Zhang, Boxin and Tang, Jingqi and Wang, Jiahao and Dong, Jianwei and Chen, Shaoyuan and Yuan, Ziwei and Lin, Chen and Qiu, Chengyu and Zhu, Yuening and Ou, Qingliang and Liao, Jiaqi and Chen, Xianglin and Ai, Zhiyuan and Wu, Yongwei and Zhang, Mingxing},
+ booktitle = {Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles},
+ year = {2025}
}
```
-Acknowledgment and Contributors
+## 👥 Contributors & Team
-The development of KTransformers is based on the flexible and versatile framework provided by Transformers. We also benefit from advanced kernels such as GGUF/GGML, Llamafile, Marlin, sglang and flashinfer. We are planning to contribute back to the community by upstreaming our modifications.
+Developed and maintained by:
+- [MADSys Lab](https://madsys.cs.tsinghua.edu.cn/) @ Tsinghua University
+- [Approaching.AI](http://approaching.ai/)
+- Community contributors
-KTransformers is actively maintained and developed by contributors from the MADSys group at Tsinghua University and members from Approaching.AI. We welcome new contributors to join us in making KTransformers faster and easier to use.
+We welcome contributions! Please feel free to submit issues and pull requests.
-Discussion
+## 💬 Community & Support
-If you have any questions, feel free to open an issue. Alternatively, you can join our WeChat group for further discussion. QR Code: [WeChat Group](WeChatGroup.png)
+- **GitHub Issues**: [Report bugs or request features](https://github.com/kvcache-ai/ktransformers/issues)
+- **GitHub Discussions**: [Ask questions and share ideas](https://github.com/kvcache-ai/ktransformers/discussions)
+- **WeChat Group**: See [archive/WeChatGroup.png](./archive/WeChatGroup.png)
-🙋 FAQ
+## 📦 Legacy Code
-Some common questions are answered in the [FAQ](doc/en/FAQ.md).
+The original integrated KTransformers framework has been archived to the [`archive/`](./archive/) directory for reference. The project now focuses on the two core modules above for better modularity and maintainability.
+
+For the original documentation with full quick-start guides and examples, see:
+- [archive/README_LEGACY.md](./archive/README_LEGACY.md) (English)
+- [archive/README_ZH_LEGACY.md](./archive/README_ZH_LEGACY.md) (中文)
diff --git a/README_ZH.md b/README_ZH.md
index 48f28e0..d91ee80 100644
--- a/README_ZH.md
+++ b/README_ZH.md
@@ -1,166 +1,132 @@
-🎉 介绍
-KTransformers(发音为 Quick Transformers)旨在通过先进的内核优化和放置/并行策略来增强您对 🤗 [Transformers](https://github.com/huggingface/transformers) 的体验。
-
-KTransformers 是一个以 Python 为中心的灵活框架,其核心是可扩展性。通过用一行代码实现并注入优化模块,用户可以获得与 Transformers 兼容的接口、符合 OpenAI 和 Ollama 的 RESTful API,甚至是一个简化的类似 ChatGPT 的 Web 界面。
-
-我们对 KTransformers 的愿景是成为一个用于实验创新 LLM 推理优化的灵活平台。如果您需要任何其他功能,请告诉我们。
+## 🎯 项目概述
-🔥 更新
+KTransformers 是一个专注于大语言模型高效推理和微调的研究项目,通过 CPU-GPU 异构计算实现资源受限环境下的模型部署。项目已演进为**两个核心模块**:[kt-kernel](./kt-kernel/) 和 [KT-SFT](./KT-SFT/)。
-* **2025 年 2 月 15 日**:为DeepSeek-V3/R1支持[FP8 GPU内核](./doc/en/fp8_kernel.md); 支持更长的上下文([教程](./doc/en/DeepseekR1_V3_tutorial.md#v022-longer-context)).
-* **2025 年 2 月 15 日**:长上下文(从4K到8K,24GB VRAM) & 稍快的速度(+15%)(最快 16 Tokens/s),文档请参见 [这里](./doc/en/DeepseekR1_V3_tutorial.md) 和 [在线指南](https://kvcache-ai.github.io/ktransformers/) 。
-* **2025 年 2 月 10 日**:支持 Deepseek-R1 和 V3 在单个(24GB VRAM)/多 GPU 和 382G DRAM 上运行,速度提升高达 3~28 倍。详细教程请参见 [这里](./doc/en/DeepseekR1_V3_tutorial.md)。
-* **2024 年 8 月 28 日**:支持 InternLM2.5-7B-Chat-1M 模型下的 1M 上下文,使用 24GB 的 VRAM 和 150GB 的 DRAM。详细教程请参见 [这里](./doc/en/long_context_tutorial.md)。
-* **2024 年 8 月 28 日**:将 DeepseekV2 所需的 VRAM 从 21G 降低到 11G。
-* **2024 年 8 月 15 日**:更新了详细的 [教程](doc/en/injection_tutorial.md),介绍注入和多 GPU 的使用。
-* **2024 年 8 月 14 日**:支持 llamfile 作为线性后端。
-* **2024 年 8 月 12 日**:支持多 GPU;支持新模型:mixtral 8\*7B 和 8\*22B;支持 q2k、q3k、q5k 在 GPU 上的去量化。
-* **2024 年 8 月 9 日**:支持 Windows。
+## 🔥 更新
-🌟 案例展示
+* **2025年11月6日**:支持 Kimi-K2-Thinking 推理和微调
+* **2025年11月4日**:KTransformers 微调 × LLaMA-Factory 集成
+* **2025年10月27日**:支持 Ascend NPU
+* **2025年10月10日**:集成到 SGLang ([路线图](https://github.com/sgl-project/sglang/issues/11425), [博客](https://lmsys.org/blog/2025-10-22-KTransformers/))
+* **2025年9月11日**:支持 Qwen3-Next
+* **2025年9月5日**:支持 Kimi-K2-0905
+* **2025年7月26日**:支持 SmallThinker 和 GLM4-MoE
+* **2025年6月30日**:支持 3层(GPU-CPU-磁盘)前缀缓存复用
+* **2025年5月14日**:支持 Intel Arc GPU
+* **2025年4月29日**:支持 AMX-Int8、AMX-BF16 和 Qwen3MoE
+* **2025年4月9日**:实验性支持 LLaMA 4 模型
+* **2025年4月2日**:支持多并发
+* **2025年3月15日**:支持 AMD GPU 的 ROCm
+* **2025年3月5日**:支持 unsloth 1.58/2.51 bits 权重和 IQ1_S/FP8 混合权重;DeepSeek-V3/R1 支持 139K 长上下文
+* **2025年2月25日**:支持 DeepSeek-V3 和 R1 的 FP8 GPU 内核
+* **2025年2月10日**:支持 Deepseek-R1 和 V3,速度提升最高达 3~28 倍
-
-
在仅 24GB VRAM 的桌面上运行 GPT-4/o1 级别的本地 VSCode Copilot
-
+---
-https://github.com/user-attachments/assets/ebd70bfa-b2c1-4abb-ae3b-296ed38aa285
+## 📦 核心模块
-
+### 🚀 [kt-kernel](./kt-kernel/) - 高性能推理内核
-- **[NEW!!!] 本地 671B DeepSeek-Coder-V3/R1**:使用其 Q4_K_M 版本,仅需 14GB VRAM 和 382GB DRAM 即可运行(教程请参见 [这里](./doc/en/DeepseekR1_V3_tutorial.md))。
- - 预填充速度(tokens/s):
- - KTransformers:54.21(32 核)→ 74.362(双插槽,2×32 核)→ 255.26(优化的 AMX 基 MoE 内核,仅 V0.3)→ 286.55(选择性使用 6 个专家,仅 V0.3)
- - 与 llama.cpp 在 2×32 核下相比,达到 **27.79× 速度提升**。
- - 解码速度(tokens/s):
- - KTransformers:8.73(32 核)→ 11.26(双插槽,2×32 核)→ 13.69(选择性使用 6 个专家,仅 V0.3)
- - 与 llama.cpp 在 2×32 核下相比,达到 **3.03× 速度提升**。
- - 即将开源发布:
- - AMX 优化和选择性专家激活将在 V0.3 中开源。
- - 目前仅在预览二进制分发中可用,可从 [这里](./doc/en/DeepseekR1_V3_tutorial.md) 下载。
+面向异构 LLM 推理的 CPU 优化内核操作库。
-- **本地 236B DeepSeek-Coder-V2**:使用其 Q4_K_M 版本,仅需 21GB VRAM 和 136GB DRAM 即可运行,甚至在 [BigCodeBench](https://huggingface.co/blog/leaderboard-bigcodebench) 中得分超过 GPT4-0613。
+
-
-
-
-
-
+**核心特性:**
+- **AMX/AVX 加速**:Intel AMX 和 AVX512/AVX2 优化内核,支持 INT4/INT8 量化推理
+- **MoE 优化**:高效的专家混合推理,支持 NUMA 感知内存管理
+- **量化支持**:CPU 端 INT4/INT8 量化权重,GPU 端 GPTQ 支持
+- **易于集成**:简洁的 Python API,可集成到 SGLang 等框架
-- **更快的速度**:通过 MoE 卸载和注入来自 [Llamafile](https://github.com/Mozilla-Ocho/llamafile/tree/main) 和 [Marlin](https://github.com/IST-DASLab/marlin) 的高级内核,实现了 2K 提示预填充 126 tokens/s 和生成 13.6 tokens/s 的速度。
-- **VSCode 集成**:封装成符合 OpenAI 和 Ollama 的 API,可无缝集成到 [Tabby](https://github.com/TabbyML/tabby) 和其他前端的后端。
-
-
-
-https://github.com/user-attachments/assets/4c6a8a38-05aa-497d-8eb1-3a5b3918429c
-
-
-
-
-
-
-
-
-更多高级功能即将推出,敬请期待!
-
-🚀 快速入门
-
-
-KTransformers 的入门非常简单!请参考我们的[安装指南]((https://kvcache-ai.github.io/ktransformers/))进行安装。
-
-📃 简要注入教程
-KTransformers 的核心是一个用户友好的、基于模板的注入框架。这使得研究人员可以轻松地将原始 torch 模块替换为优化的变体。它还简化了多种优化的组合过程,允许探索它们的协同效应。
-
-
-
-
-
-
-
-鉴于 vLLM 已经是一个用于大规模部署优化的优秀框架,KTransformers 特别关注受资源限制的本地部署。我们特别关注异构计算时机,例如量化模型的 GPU/CPU 卸载。例如,我们支持高效的 Llamafile 和Marlin 内核,分别用于 CPU 和 GPU。 更多详细信息可以在 这里找到。
-
-
-示例用法
-要使用提供的内核,用户只需创建一个基于 YAML 的注入模板,并在使用 Transformers 模型之前添加对 `optimize_and_load_gguf` 的调用。
-
-```python
-with torch.device("meta"):
- model = AutoModelForCausalLM.from_config(config, trust_remote_code=True)
-optimize_and_load_gguf(model, optimize_config_path, gguf_path, config)
-...
-generated = prefill_and_generate(model, tokenizer, input_tensor.cuda(), max_new_tokens=1000)
+**快速开始:**
+```bash
+cd kt-kernel
+pip install .
```
-在这个示例中,首先在 meta 设备上初始化 AutoModel,以避免占用任何内存资源。然后,`optimize_and_load_gguf` 遍历模型的所有子模块,匹配您的 YAML 规则文件中指定的规则,并将它们替换为指定的高级模块。
+**应用场景:**
+- 大型 MoE 模型的 CPU-GPU 混合推理
+- 与 SGLang 集成用于生产服务
+- 异构专家放置(热门专家在 GPU,冷门专家在 CPU)
-注入后,原始的 `generate` 接口仍然可用,但我们还提供了一个兼容的 `prefill_and_generate` 方法,这使得可以进一步优化,例如使用 CUDAGraph 提高生成速度。
+**性能示例:**
+| 模型 | 硬件配置 | 总吞吐量 | 输出吞吐量 |
+|------|---------|---------|-----------|
+| DeepSeek-R1-0528 (FP8) | 8×L20 GPU + Xeon Gold 6454S | 227.85 tokens/s | 87.58 tokens/s(8路并发)|
-如何自定义您的模型
+👉 **[完整文档 →](./kt-kernel/README.md)**
-一个详细的使用 DeepSeek-V2 作为示例的注入和 multi-GPU 教程在 [这里](doc/en/injection_tutorial.md)。
+---
-以下是一个将所有原始 Linear 模块替换为 Marlin 的 YAML 模板示例,Marlin 是一个高级的 4 位量化内核。
+### 🎓 [KT-SFT](./KT-SFT/) - 微调框架
-```yaml
-- match:
- name: "^model\\.layers\\..*$" # 正则表达式
- class: torch.nn.Linear # 仅匹配同时符合名称和类的模块
- replace:
- class: ktransformers.operators.linear.KTransformerLinear # 量化数据类型的优化内核
- device: "cpu" # 初始化时加载该模块的 device
- kwargs:
- generate_device: "cuda"
- generate_linear_type: "QuantizedLinearMarlin"
+KTransformers × LLaMA-Factory 集成,支持超大 MoE 模型微调。
+
+
+
+**核心特性:**
+- **资源高效**:仅需 **70GB 显存** + 1.3TB 内存即可微调 671B DeepSeek-V3
+- **LoRA 支持**:完整的 LoRA 微调与异构加速
+- **LLaMA-Factory 集成**:与流行微调框架无缝集成
+- **生产就绪**:支持对话、批量推理和指标评估
+
+**性能示例:**
+| 模型 | 配置 | 吞吐量 | GPU 显存 |
+|------|------|--------|----------|
+| DeepSeek-V3 (671B) | LoRA + AMX | ~40 tokens/s | 70GB (多卡) |
+| DeepSeek-V2-Lite (14B) | LoRA + AMX | ~530 tokens/s | 6GB |
+
+**快速开始:**
+```bash
+cd KT-SFT
+# 按照 KT-SFT/README.md 安装环境
+USE_KT=1 llamafactory-cli train examples/train_lora/deepseek3_lora_sft_kt.yaml
```
-YAML 文件中的每个规则都有两部分:`match` 和 `replace`。`match` 部分指定应替换的模块,`replace` 部分指定要注入到模型中的模块以及初始化关键字。
+👉 **[完整文档 →](./KT-SFT/README.md)**
-您可以在 [ktransformers/optimize/optimize_rules](ktransformers/optimize/optimize_rules) 目录中找到用于优化 DeepSeek-V2 和 Qwen2-57B-A14 的示例规则模板。这些模板用于为 `local_chat.py` 示例提供支持。
+---
-如果您对我们的设计原则和注入框架的实现感兴趣,请参考 [设计文档](doc/en/deepseek-v2-injection.md)。
+## 🔥 引用
-致谢和贡献者
+如果您在研究中使用了 KTransformers,请引用我们的论文:
-KTransformers 的开发基于 Transformers 提供的灵活和多功能框架。我们还受益于 GGUF/GGML、Llamafile 、 Marlin、sglang和flashinfer 等高级内核。我们计划通过向上游贡献我们的修改来回馈社区。
+```bibtex
+@inproceedings{10.1145/3731569.3764843,
+ title = {KTransformers: Unleashing the Full Potential of CPU/GPU Hybrid Inference for MoE Models},
+ author = {Chen, Hongtao and Xie, Weiyu and Zhang, Boxin and Tang, Jingqi and Wang, Jiahao and Dong, Jianwei and Chen, Shaoyuan and Yuan, Ziwei and Lin, Chen and Qiu, Chengyu and Zhu, Yuening and Ou, Qingliang and Liao, Jiaqi and Chen, Xianglin and Ai, Zhiyuan and Wu, Yongwei and Zhang, Mingxing},
+ booktitle = {Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles},
+ year = {2025}
+}
+```
-KTransformers 由清华大学 MADSys group 小组的成员以及 Approaching.AI 的成员积极维护和开发。我们欢迎新的贡献者加入我们,使 KTransformers 更快、更易于使用。
+## 👥 贡献者与团队
+由以下团队开发和维护:
+- 清华大学 [MADSys 实验室](https://madsys.cs.tsinghua.edu.cn/)
+- [Approaching.AI](http://approaching.ai/)
+- 社区贡献者
-讨论
+我们欢迎贡献!请随时提交 issues 和 pull requests。
-如果您有任何问题,欢迎随时提出 issue。或者,您可以加入我们的微信群进行进一步讨论。二维码: [微信群](WeChatGroup.png)
+## 💬 社区与支持
-🙋 常见问题
+- **GitHub Issues**:[报告 bug 或请求功能](https://github.com/kvcache-ai/ktransformers/issues)
+- **GitHub Discussions**:[提问和分享想法](https://github.com/kvcache-ai/ktransformers/discussions)
+- **微信群**:查看 [archive/WeChatGroup.png](./archive/WeChatGroup.png)
-一些常见问题的答案可以在 [FAQ](doc/en/FAQ.md) 中找到。
+## 📦 历史代码
+
+原完整的 KTransformers 框架代码已归档至 [`archive/`](./archive/) 目录供参考。项目现专注于上述两个核心模块,以实现更好的模块化和可维护性。
+
+关于原始完整文档(包含快速入门指南和示例),请查看:
+- [archive/README_LEGACY.md](./archive/README_LEGACY.md) (English)
+- [archive/README_ZH_LEGACY.md](./archive/README_ZH_LEGACY.md) (中文)
diff --git a/.devcontainer/Dockerfile b/archive/.devcontainer/Dockerfile
similarity index 100%
rename from .devcontainer/Dockerfile
rename to archive/.devcontainer/Dockerfile
diff --git a/.devcontainer/devcontainer.json b/archive/.devcontainer/devcontainer.json
similarity index 100%
rename from .devcontainer/devcontainer.json
rename to archive/.devcontainer/devcontainer.json
diff --git a/.flake8 b/archive/.flake8
similarity index 100%
rename from .flake8
rename to archive/.flake8
diff --git a/archive/.gitmodules b/archive/.gitmodules
new file mode 100644
index 0000000..65cf661
--- /dev/null
+++ b/archive/.gitmodules
@@ -0,0 +1,28 @@
+[submodule "third_party/llama.cpp"]
+ path = archive/third_party/llama.cpp
+ url = https://github.com/ggerganov/llama.cpp.git
+[submodule "third_party/pybind11"]
+ path = archive/third_party/pybind11
+ url = https://github.com/pybind/pybind11.git
+[submodule "third_party/spdlog"]
+ path = archive/third_party/spdlog
+ url = https://github.com/gabime/spdlog.git
+[submodule "third_party/custom_flashinfer"]
+ path = archive/third_party/custom_flashinfer
+ url = https://github.com/kvcache-ai/custom_flashinfer.git
+ branch = fix-precision-mla-merge-main
+[submodule "third_party/xxHash"]
+ path = archive/third_party/xxHash
+ url = https://github.com/Cyan4973/xxHash.git
+[submodule "third_party/prometheus-cpp"]
+ path = archive/third_party/prometheus-cpp
+ url = https://github.com/jupp0r/prometheus-cpp
+[submodule "third_party/PhotonLibOS"]
+ path = archive/third_party/PhotonLibOS
+ url = https://github.com/alibaba/PhotonLibOS.git
+[submodule "kt-kernel/third_party/llama.cpp"]
+ path = kt-kernel/third_party/llama.cpp
+ url = https://github.com/ggerganov/llama.cpp.git
+[submodule "kt-kernel/third_party/pybind11"]
+ path = kt-kernel/third_party/pybind11
+ url = https://github.com/pybind/pybind11.git
diff --git a/.pylintrc b/archive/.pylintrc
similarity index 100%
rename from .pylintrc
rename to archive/.pylintrc
diff --git a/Dockerfile b/archive/Dockerfile
similarity index 100%
rename from Dockerfile
rename to archive/Dockerfile
diff --git a/Dockerfile.xpu b/archive/Dockerfile.xpu
similarity index 100%
rename from Dockerfile.xpu
rename to archive/Dockerfile.xpu
diff --git a/LICENSE b/archive/LICENSE
similarity index 100%
rename from LICENSE
rename to archive/LICENSE
diff --git a/MANIFEST.in b/archive/MANIFEST.in
similarity index 100%
rename from MANIFEST.in
rename to archive/MANIFEST.in
diff --git a/Makefile b/archive/Makefile
similarity index 100%
rename from Makefile
rename to archive/Makefile
diff --git a/archive/README.md b/archive/README.md
new file mode 100644
index 0000000..1f37e93
--- /dev/null
+++ b/archive/README.md
@@ -0,0 +1,103 @@
+# Archive - Legacy KTransformers Code
+
+This directory contains the original integrated KTransformers framework code that has been archived as part of the repository restructuring.
+
+## 📋 What's Here
+
+This archive preserves the complete original KTransformers implementation, including:
+
+- **Core Framework** (`ktransformers/`): Original integrated inference framework
+- **C/C++ Extensions** (`csrc/`): Low-level kernel implementations
+- **Third-party Dependencies** (`third_party/`): Vendored external libraries
+- **Git Submodules** (`.gitmodules`): Complete submodule configuration for legacy dependencies
+- **Build System**: Installation scripts, Dockerfiles, and configuration files
+- **Legacy Documentation**: Original README files with full quick-start guides
+
+## 📚 Documentation
+
+### Original README Files
+
+- **[English README (Legacy)](./README_LEGACY.md)**: Complete original English documentation with:
+ - Quick Start guides
+ - Show cases and benchmarks
+ - Injection tutorial
+ - Full installation instructions
+
+- **[中文 README (Legacy)](./README_ZH_LEGACY.md)**: 完整的原始中文文档,包含:
+ - 快速入门指南
+ - 案例展示和基准测试
+ - 注入教程
+ - 完整安装说明
+
+## 🔄 Migration to New Structure
+
+The KTransformers project has evolved into two focused modules:
+
+### For Inference (CPU-optimized kernels):
+→ Use **[kt-kernel](../kt-kernel/)** instead
+
+### For Fine-tuning (LLaMA-Factory integration):
+→ Use **[KT-SFT](../KT-SFT/)** instead
+
+## ⚠️ Status
+
+This code is **archived for reference only**. For active development and support:
+
+- **Inference**: See [kt-kernel](../kt-kernel/)
+- **Fine-tuning**: See [KT-SFT](../KT-SFT/)
+- **Documentation**: See [doc](../doc/) directory
+- **Issues**: Visit [GitHub Issues](https://github.com/kvcache-ai/ktransformers/issues)
+
+## 🔧 Git Submodules (For Researchers)
+
+The root `.gitmodules` only contains kt-kernel's dependencies to keep the repository lightweight. If you need to build the legacy code, you can use the archived submodule configuration:
+
+```bash
+# Copy the complete submodule configuration
+cp archive/.gitmodules .gitmodules
+
+# Initialize legacy submodules
+git submodule update --init --recursive archive/third_party/
+```
+
+**Note**: This will download ~500MB of additional dependencies.
+
+## 📦 Contents Overview
+
+```
+archive/
+├── README.md # This file
+├── README_LEGACY.md # Original English documentation
+├── README_ZH_LEGACY.md # Original Chinese documentation
+├── .gitmodules # Complete git submodule configuration (7 legacy submodules)
+├── ktransformers/ # Original framework code
+├── csrc/ # C/C++ extensions
+├── third_party/ # External dependencies (submodules not initialized by default)
+├── setup.py # Original installation script
+├── pyproject.toml # Python project configuration
+├── Dockerfile* # Container configurations
+├── install*.sh # Installation scripts
+└── ... # Other legacy files
+```
+
+## 💡 Why Archived?
+
+The original monolithic framework has been refactored into modular components for:
+
+1. **Better Maintainability**: Separated concerns between inference and fine-tuning
+2. **Easier Integration**: Cleaner APIs for external frameworks (SGLang, LLaMA-Factory)
+3. **Focused Development**: Dedicated modules with specific optimization goals
+4. **Reduced Complexity**: Smaller, more manageable codebases
+
+## 🔗 Related Resources
+
+- **Main Repository**: [../README.md](../README.md)
+- **kt-kernel Documentation**: [../kt-kernel/README.md](../kt-kernel/README.md)
+- **KT-SFT Documentation**: [../KT-SFT/README.md](../KT-SFT/README.md)
+- **Project Website**: https://kvcache-ai.github.io/ktransformers/
+
+---
+
+
+ Archived on 2025-11 as part of repository restructuring
+
diff --git a/archive/README_LEGACY.md b/archive/README_LEGACY.md
new file mode 100644
index 0000000..92ba915
--- /dev/null
+++ b/archive/README_LEGACY.md
@@ -0,0 +1,217 @@
+
+
+🎉 Introduction
+KTransformers, pronounced as Quick Transformers, is designed to enhance your 🤗 Transformers experience with advanced kernel optimizations and placement/parallelism strategies.
+
+KTransformers is a flexible, Python-centric framework designed with extensibility at its core.
+By implementing and injecting an optimized module with a single line of code, users gain access to a Transformers-compatible
+interface, RESTful APIs compliant with OpenAI and Ollama, and even a simplified ChatGPT-like web UI.
+
+Our vision for KTransformers is to serve as a flexible platform for experimenting with innovative LLM inference optimizations. Please let us know if you need any other features.
+
+🔥 Updates
+
+* **Nov 6, 2025**: Support Kimi-K2-Thinking inference ([Tutorial](./doc/en/Kimi-K2-Thinking.md)) and fine-tune ([Tutorial](./doc/en/SFT_Installation_Guide_KimiK2.md))
+* **Nov 4, 2025**: KTransformers Fine-Tuning × LLaMA-Factory Integration. ([Tutorial](./doc/en/KTransformers-Fine-Tuning_User-Guide.md))
+* **Oct 27, 2025**: Support Ascend NPU. ([Tutorial](./doc/zh/DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md))
+* **Oct 10, 2025**: Integrating into SGLang. ([Roadmap](https://github.com/sgl-project/sglang/issues/11425))
+* **Sept 11, 2025**: Support Qwen3-Next. ([Tutorial](./doc/en/Qwen3-Next.md))
+* **Sept 05, 2025**: Support Kimi-K2-0905. ([Tutorial](./doc/en/Kimi-K2.md))
+* **July 26, 2025**: Support SmallThinker and GLM4-MoE. ([Tutorial](./doc/en/SmallThinker_and_Glm4moe.md))
+* **July 11, 2025**: Support Kimi-K2. ([Tutorial](./doc/en/Kimi-K2.md))
+* **June 30, 2025**: Support 3-layer (GPU-CPU-Disk) [prefix cache](./doc/en/prefix_cache.md) reuse.
+* **May 14, 2025**: Support Intel Arc GPU ([Tutorial](./doc/en/xpu.md)).
+* **Apr 29, 2025**: Support AMX-Int8、 AMX-BF16 and Qwen3MoE ([Tutorial](./doc/en/AMX.md))
+
+https://github.com/user-attachments/assets/fafe8aec-4e22-49a8-8553-59fb5c6b00a2
+
+* **Apr 9, 2025**: Experimental support for LLaMA 4 models ([Tutorial](./doc/en/llama4.md)).
+* **Apr 2, 2025**: Support Multi-concurrency. ([Tutorial](./doc/en/balance-serve.md)).
+
+https://github.com/user-attachments/assets/faa3bda2-928b-45a7-b44f-21e12ec84b8a
+
+* **Mar 15, 2025**: Support ROCm on AMD GPU ([Tutorial](./doc/en/ROCm.md)).
+* **Mar 5, 2025**: Support unsloth 1.58/2.51 bits weights and [IQ1_S/FP8 hybrid](./doc/en/fp8_kernel.md) weights. Support 139K [Longer Context](./doc/en/DeepseekR1_V3_tutorial.md#v022--v023-longer-context--fp8-kernel) for DeepSeek-V3 and R1 in 24GB VRAM.
+* **Feb 25, 2025**: Support [FP8 GPU kernel](./doc/en/fp8_kernel.md) for DeepSeek-V3 and R1; [Longer Context](./doc/en/DeepseekR1_V3_tutorial.md#v022-longer-context).
+* **Feb 15, 2025**: Longer Context (from 4K to 8K for 24GB VRAM) & Slightly Faster Speed (+15%, up to 16 Tokens/s), update [docs](./doc/en/DeepseekR1_V3_tutorial.md) and [online books](https://kvcache-ai.github.io/ktransformers/).
+* **Feb 10, 2025**: Support Deepseek-R1 and V3 on single (24GB VRAM)/multi gpu and 382G DRAM, up to 3~28x speedup. For detailed show case and reproduction tutorial, see [here](./doc/en/DeepseekR1_V3_tutorial.md).
+* **Aug 28, 2024**: Decrease DeepseekV2's required VRAM from 21G to 11G.
+* **Aug 15, 2024**: Update detailed [tutorial](doc/en/injection_tutorial.md) for injection and multi-GPU.
+* **Aug 14, 2024**: Support llamfile as linear backend.
+* **Aug 12, 2024**: Support multiple GPU; Support new model: mixtral 8\*7B and 8\*22B; Support q2k, q3k, q5k dequant on gpu.
+* **Aug 9, 2024**: Support windows native.
+
+
+
+🌟 Show Cases
+
+
+
GPT-4/o1-level Local VSCode Copilot on a Desktop with only 24GB VRAM
+
+
+https://github.com/user-attachments/assets/ebd70bfa-b2c1-4abb-ae3b-296ed38aa285
+
+
+
+- **[NEW!!!] Local 671B DeepSeek-Coder-V3/R1:** Running its Q4_K_M version using only 14GB VRAM and 382GB DRAM([Tutorial](./doc/en/DeepseekR1_V3_tutorial.md)).
+
+ - Prefill Speed (tokens/s):
+ - KTransformers: 54.21 (32 cores) → 74.362 (dual-socket, 2×32 cores) → 255.26 (optimized AMX-based MoE kernel, V0.3 only) → 286.55 (selectively using 6 experts, V0.3 only)
+ - Compared to 10.31 tokens/s in llama.cpp with 2×32 cores, achieving up to **27.79× speedup**.
+ - Decode Speed (tokens/s):
+ - KTransformers: 8.73 (32 cores) → 11.26 (dual-socket, 2×32 cores) → 13.69 (selectively using 6 experts, V0.3 only)
+ - Compared to 4.51 tokens/s in llama.cpp with 2×32 cores, achieving up to **3.03× speedup**.
+ - Upcoming Open Source Release:
+ - AMX optimizations and selective expert activation will be open-sourced in V0.3.
+ - Currently available only in preview binary distribution, which can be downloaded [here](./doc/en/DeepseekR1_V3_tutorial.md).
+- **Local 236B DeepSeek-Coder-V2:** Running its Q4_K_M version using only 21GB VRAM and 136GB DRAM, attainable on a local desktop machine, which scores even better than GPT4-0613 in [BigCodeBench](https://huggingface.co/blog/leaderboard-bigcodebench).
+
+
+
+
+
+
+
+- **Faster Speed:** Achieving 126 tokens/s for 2K prompt prefill and 13.6 tokens/s for generation through MoE offloading and injecting advanced kernels from [Llamafile](https://github.com/Mozilla-Ocho/llamafile/tree/main) and [Marlin](https://github.com/IST-DASLab/marlin).
+- **VSCode Integration:** Wrapped into an OpenAI and Ollama compatible API for seamless integration as a backend for [Tabby](https://github.com/TabbyML/tabby) and various other frontends.
+
+
+
+https://github.com/user-attachments/assets/4c6a8a38-05aa-497d-8eb1-3a5b3918429c
+
+
+
+
+
+More advanced features will coming soon, so stay tuned!
+
+🚀 Quick Start
+
+Getting started with KTransformers is simple! Follow the steps below to set up and start using it.
+
+we have already supported vendors:
+
+- Metax
+- Sanechips (ZhuFeng V1.0)
+- Intel
+- Ascend
+- Kunpeng
+- AMD
+
+### 📥 Installation
+
+To install KTransformers, follow the official [Installation Guide](https://kvcache-ai.github.io/ktransformers/en/install.html).
+
+📃 Brief Injection Tutorial
+At the heart of KTransformers is a user-friendly, template-based injection framework.
+This allows researchers to easily replace original torch modules with optimized variants. It also simplifies the process of combining multiple optimizations, allowing the exploration of their synergistic effects.
+
+
+
+
+
+
+
+
+Given that vLLM already serves as a great framework for large-scale deployment optimizations, KTransformers is particularly focused on local deployments that are constrained by limited resources. We pay special attention to heterogeneous computing opportunities, such as GPU/CPU offloading of quantized models. For example, we support the efficient Llamafile and Marlin kernels for CPU and GPU, respectively. More details can be found here.
+
+Example Usage
+To utilize the provided kernels, users only need to create a YAML-based injection template and add the call to `optimize_and_load_gguf` before using the Transformers model.
+
+```python
+with torch.device("meta"):
+ model = AutoModelForCausalLM.from_config(config, trust_remote_code=True)
+optimize_and_load_gguf(model, optimize_config_path, gguf_path, config)
+...
+generated = prefill_and_generate(model, tokenizer, input_tensor.cuda(), max_new_tokens=1000)
+```
+
+In this example, the AutoModel is first initialized on the meta device to avoid occupying any memory resources. Then, `optimize_and_load_gguf` iterates through all sub-modules of the model, matches rules specified in your YAML rule file, and replaces them with advanced modules as specified.
+
+After injection, the original `generate` interface is available, but we also provide a compatible `prefill_and_generate` method, which enables further optimizations like CUDAGraph to improve generation speed.
+
+How to custom your model
+
+A detailed tutorial of the injection and multi-GPU using DeepSeek-V2 as an example is given [here](doc/en/injection_tutorial.md).
+
+Below is an example of a YAML template for replacing all original Linear modules with Marlin, an advanced 4-bit quantization kernel.
+
+```yaml
+- match:
+ name: "^model\\.layers\\..*$" # regular expression
+ class: torch.nn.Linear # only match modules matching name and class simultaneously
+ replace:
+ class: ktransformers.operators.linear.KTransformerLinear # optimized Kernel on quantized data types
+ device: "cpu" # which devices to load this module when initializing
+ kwargs:
+ generate_device: "cuda"
+ generate_linear_type: "QuantizedLinearMarlin"
+```
+
+Each rule in the YAML file has two parts: `match` and `replace`. The `match` part specifies which module should be replaced, and the `replace` part specifies the module to be injected into the model along with the initialization keywords.
+
+You can find example rule templates for optimizing DeepSeek-V2 and Qwen2-57B-A14, two SOTA MoE models, in the [ktransformers/optimize/optimize_rules](ktransformers/optimize/optimize_rules) directory. These templates are used to power the `local_chat.py` demo.
+
+If you are interested in our design principles and the implementation of the injection framework, please refer to the [design document](doc/en/deepseek-v2-injection.md).
+
+🔥 Citation
+
+If you use KTransformers for your research, please cite our [paper](https://madsys.cs.tsinghua.edu.cn/publication/ktransformers-unleashing-the-full-potential-of-cpu/gpu-hybrid-inference-for-moe-models/):
+
+```
+@inproceedings{10.1145/3731569.3764843,
+title = {KTransformers: Unleashing the Full Potential of CPU/GPU Hybrid Inference for MoE Models},
+author = {Chen, Hongtao and Xie, Weiyu and Zhang, Boxin and Tang, Jingqi and Wang, Jiahao and Dong, Jianwei and Chen, Shaoyuan and Yuan, Ziwei and Lin, Chen and Qiu, Chengyu and Zhu, Yuening and Ou, Qingliang and Liao, Jiaqi and Chen, Xianglin and Ai, Zhiyuan and Wu, Yongwei and Zhang, Mingxing},
+booktitle = {Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles},
+year = {2025}
+}
+```
+
+Acknowledgment and Contributors
+
+The development of KTransformers is based on the flexible and versatile framework provided by Transformers. We also benefit from advanced kernels such as GGUF/GGML, Llamafile, Marlin, sglang and flashinfer. We are planning to contribute back to the community by upstreaming our modifications.
+
+KTransformers is actively maintained and developed by contributors from the MADSys group at Tsinghua University and members from Approaching.AI. We welcome new contributors to join us in making KTransformers faster and easier to use.
+
+Discussion
+
+If you have any questions, feel free to open an issue. Alternatively, you can join our WeChat group for further discussion. QR Code: [WeChat Group](WeChatGroup.png)
+
+🙋 FAQ
+
+Some common questions are answered in the [FAQ](doc/en/FAQ.md).
+
diff --git a/archive/README_ZH_LEGACY.md b/archive/README_ZH_LEGACY.md
new file mode 100644
index 0000000..48f28e0
--- /dev/null
+++ b/archive/README_ZH_LEGACY.md
@@ -0,0 +1,166 @@
+
+
+🎉 介绍
+KTransformers(发音为 Quick Transformers)旨在通过先进的内核优化和放置/并行策略来增强您对 🤗 [Transformers](https://github.com/huggingface/transformers) 的体验。
+
+KTransformers 是一个以 Python 为中心的灵活框架,其核心是可扩展性。通过用一行代码实现并注入优化模块,用户可以获得与 Transformers 兼容的接口、符合 OpenAI 和 Ollama 的 RESTful API,甚至是一个简化的类似 ChatGPT 的 Web 界面。
+
+我们对 KTransformers 的愿景是成为一个用于实验创新 LLM 推理优化的灵活平台。如果您需要任何其他功能,请告诉我们。
+
+🔥 更新
+
+* **2025 年 2 月 15 日**:为DeepSeek-V3/R1支持[FP8 GPU内核](./doc/en/fp8_kernel.md); 支持更长的上下文([教程](./doc/en/DeepseekR1_V3_tutorial.md#v022-longer-context)).
+* **2025 年 2 月 15 日**:长上下文(从4K到8K,24GB VRAM) & 稍快的速度(+15%)(最快 16 Tokens/s),文档请参见 [这里](./doc/en/DeepseekR1_V3_tutorial.md) 和 [在线指南](https://kvcache-ai.github.io/ktransformers/) 。
+* **2025 年 2 月 10 日**:支持 Deepseek-R1 和 V3 在单个(24GB VRAM)/多 GPU 和 382G DRAM 上运行,速度提升高达 3~28 倍。详细教程请参见 [这里](./doc/en/DeepseekR1_V3_tutorial.md)。
+* **2024 年 8 月 28 日**:支持 InternLM2.5-7B-Chat-1M 模型下的 1M 上下文,使用 24GB 的 VRAM 和 150GB 的 DRAM。详细教程请参见 [这里](./doc/en/long_context_tutorial.md)。
+* **2024 年 8 月 28 日**:将 DeepseekV2 所需的 VRAM 从 21G 降低到 11G。
+* **2024 年 8 月 15 日**:更新了详细的 [教程](doc/en/injection_tutorial.md),介绍注入和多 GPU 的使用。
+* **2024 年 8 月 14 日**:支持 llamfile 作为线性后端。
+* **2024 年 8 月 12 日**:支持多 GPU;支持新模型:mixtral 8\*7B 和 8\*22B;支持 q2k、q3k、q5k 在 GPU 上的去量化。
+* **2024 年 8 月 9 日**:支持 Windows。
+
+🌟 案例展示
+
+
+
在仅 24GB VRAM 的桌面上运行 GPT-4/o1 级别的本地 VSCode Copilot
+
+
+https://github.com/user-attachments/assets/ebd70bfa-b2c1-4abb-ae3b-296ed38aa285
+
+
+
+- **[NEW!!!] 本地 671B DeepSeek-Coder-V3/R1**:使用其 Q4_K_M 版本,仅需 14GB VRAM 和 382GB DRAM 即可运行(教程请参见 [这里](./doc/en/DeepseekR1_V3_tutorial.md))。
+ - 预填充速度(tokens/s):
+ - KTransformers:54.21(32 核)→ 74.362(双插槽,2×32 核)→ 255.26(优化的 AMX 基 MoE 内核,仅 V0.3)→ 286.55(选择性使用 6 个专家,仅 V0.3)
+ - 与 llama.cpp 在 2×32 核下相比,达到 **27.79× 速度提升**。
+ - 解码速度(tokens/s):
+ - KTransformers:8.73(32 核)→ 11.26(双插槽,2×32 核)→ 13.69(选择性使用 6 个专家,仅 V0.3)
+ - 与 llama.cpp 在 2×32 核下相比,达到 **3.03× 速度提升**。
+ - 即将开源发布:
+ - AMX 优化和选择性专家激活将在 V0.3 中开源。
+ - 目前仅在预览二进制分发中可用,可从 [这里](./doc/en/DeepseekR1_V3_tutorial.md) 下载。
+
+- **本地 236B DeepSeek-Coder-V2**:使用其 Q4_K_M 版本,仅需 21GB VRAM 和 136GB DRAM 即可运行,甚至在 [BigCodeBench](https://huggingface.co/blog/leaderboard-bigcodebench) 中得分超过 GPT4-0613。
+
+
+
+
+
+
+
+- **更快的速度**:通过 MoE 卸载和注入来自 [Llamafile](https://github.com/Mozilla-Ocho/llamafile/tree/main) 和 [Marlin](https://github.com/IST-DASLab/marlin) 的高级内核,实现了 2K 提示预填充 126 tokens/s 和生成 13.6 tokens/s 的速度。
+- **VSCode 集成**:封装成符合 OpenAI 和 Ollama 的 API,可无缝集成到 [Tabby](https://github.com/TabbyML/tabby) 和其他前端的后端。
+
+
+
+https://github.com/user-attachments/assets/4c6a8a38-05aa-497d-8eb1-3a5b3918429c
+
+
+
+
+
+
+
+
+更多高级功能即将推出,敬请期待!
+
+🚀 快速入门
+
+
+KTransformers 的入门非常简单!请参考我们的[安装指南]((https://kvcache-ai.github.io/ktransformers/))进行安装。
+
+📃 简要注入教程
+KTransformers 的核心是一个用户友好的、基于模板的注入框架。这使得研究人员可以轻松地将原始 torch 模块替换为优化的变体。它还简化了多种优化的组合过程,允许探索它们的协同效应。
+
+
+
+
+
+
+
+鉴于 vLLM 已经是一个用于大规模部署优化的优秀框架,KTransformers 特别关注受资源限制的本地部署。我们特别关注异构计算时机,例如量化模型的 GPU/CPU 卸载。例如,我们支持高效的 Llamafile 和Marlin 内核,分别用于 CPU 和 GPU。 更多详细信息可以在 这里找到。
+
+
+示例用法
+要使用提供的内核,用户只需创建一个基于 YAML 的注入模板,并在使用 Transformers 模型之前添加对 `optimize_and_load_gguf` 的调用。
+
+```python
+with torch.device("meta"):
+ model = AutoModelForCausalLM.from_config(config, trust_remote_code=True)
+optimize_and_load_gguf(model, optimize_config_path, gguf_path, config)
+...
+generated = prefill_and_generate(model, tokenizer, input_tensor.cuda(), max_new_tokens=1000)
+```
+
+在这个示例中,首先在 meta 设备上初始化 AutoModel,以避免占用任何内存资源。然后,`optimize_and_load_gguf` 遍历模型的所有子模块,匹配您的 YAML 规则文件中指定的规则,并将它们替换为指定的高级模块。
+
+注入后,原始的 `generate` 接口仍然可用,但我们还提供了一个兼容的 `prefill_and_generate` 方法,这使得可以进一步优化,例如使用 CUDAGraph 提高生成速度。
+
+如何自定义您的模型
+
+一个详细的使用 DeepSeek-V2 作为示例的注入和 multi-GPU 教程在 [这里](doc/en/injection_tutorial.md)。
+
+以下是一个将所有原始 Linear 模块替换为 Marlin 的 YAML 模板示例,Marlin 是一个高级的 4 位量化内核。
+
+```yaml
+- match:
+ name: "^model\\.layers\\..*$" # 正则表达式
+ class: torch.nn.Linear # 仅匹配同时符合名称和类的模块
+ replace:
+ class: ktransformers.operators.linear.KTransformerLinear # 量化数据类型的优化内核
+ device: "cpu" # 初始化时加载该模块的 device
+ kwargs:
+ generate_device: "cuda"
+ generate_linear_type: "QuantizedLinearMarlin"
+```
+
+YAML 文件中的每个规则都有两部分:`match` 和 `replace`。`match` 部分指定应替换的模块,`replace` 部分指定要注入到模型中的模块以及初始化关键字。
+
+您可以在 [ktransformers/optimize/optimize_rules](ktransformers/optimize/optimize_rules) 目录中找到用于优化 DeepSeek-V2 和 Qwen2-57B-A14 的示例规则模板。这些模板用于为 `local_chat.py` 示例提供支持。
+
+如果您对我们的设计原则和注入框架的实现感兴趣,请参考 [设计文档](doc/en/deepseek-v2-injection.md)。
+
+致谢和贡献者
+
+KTransformers 的开发基于 Transformers 提供的灵活和多功能框架。我们还受益于 GGUF/GGML、Llamafile 、 Marlin、sglang和flashinfer 等高级内核。我们计划通过向上游贡献我们的修改来回馈社区。
+
+KTransformers 由清华大学 MADSys group 小组的成员以及 Approaching.AI 的成员积极维护和开发。我们欢迎新的贡献者加入我们,使 KTransformers 更快、更易于使用。
+
+
+讨论
+
+如果您有任何问题,欢迎随时提出 issue。或者,您可以加入我们的微信群进行进一步讨论。二维码: [微信群](WeChatGroup.png)
+
+🙋 常见问题
+
+一些常见问题的答案可以在 [FAQ](doc/en/FAQ.md) 中找到。
diff --git a/SECURITY.md b/archive/SECURITY.md
similarity index 100%
rename from SECURITY.md
rename to archive/SECURITY.md
diff --git a/WeChatGroup.png b/archive/WeChatGroup.png
similarity index 100%
rename from WeChatGroup.png
rename to archive/WeChatGroup.png
diff --git a/book.toml b/archive/book.toml
similarity index 100%
rename from book.toml
rename to archive/book.toml
diff --git a/config.json b/archive/config.json
similarity index 100%
rename from config.json
rename to archive/config.json
diff --git a/csrc/balance_serve/CMakeLists.txt b/archive/csrc/balance_serve/CMakeLists.txt
similarity index 100%
rename from csrc/balance_serve/CMakeLists.txt
rename to archive/csrc/balance_serve/CMakeLists.txt
diff --git a/csrc/balance_serve/kvc2/.clang-format b/archive/csrc/balance_serve/kvc2/.clang-format
similarity index 100%
rename from csrc/balance_serve/kvc2/.clang-format
rename to archive/csrc/balance_serve/kvc2/.clang-format
diff --git a/csrc/balance_serve/kvc2/CMakeLists.txt b/archive/csrc/balance_serve/kvc2/CMakeLists.txt
similarity index 100%
rename from csrc/balance_serve/kvc2/CMakeLists.txt
rename to archive/csrc/balance_serve/kvc2/CMakeLists.txt
diff --git a/csrc/balance_serve/kvc2/README.md b/archive/csrc/balance_serve/kvc2/README.md
similarity index 100%
rename from csrc/balance_serve/kvc2/README.md
rename to archive/csrc/balance_serve/kvc2/README.md
diff --git a/csrc/balance_serve/kvc2/config/model_configs.json b/archive/csrc/balance_serve/kvc2/config/model_configs.json
similarity index 100%
rename from csrc/balance_serve/kvc2/config/model_configs.json
rename to archive/csrc/balance_serve/kvc2/config/model_configs.json
diff --git a/csrc/balance_serve/kvc2/config/quant_configs.json b/archive/csrc/balance_serve/kvc2/config/quant_configs.json
similarity index 100%
rename from csrc/balance_serve/kvc2/config/quant_configs.json
rename to archive/csrc/balance_serve/kvc2/config/quant_configs.json
diff --git a/csrc/balance_serve/kvc2/export_envs_before_run.sh b/archive/csrc/balance_serve/kvc2/export_envs_before_run.sh
similarity index 100%
rename from csrc/balance_serve/kvc2/export_envs_before_run.sh
rename to archive/csrc/balance_serve/kvc2/export_envs_before_run.sh
diff --git a/csrc/balance_serve/kvc2/install_deps.sh b/archive/csrc/balance_serve/kvc2/install_deps.sh
similarity index 100%
rename from csrc/balance_serve/kvc2/install_deps.sh
rename to archive/csrc/balance_serve/kvc2/install_deps.sh
diff --git a/csrc/balance_serve/kvc2/mkfs.sh b/archive/csrc/balance_serve/kvc2/mkfs.sh
similarity index 100%
rename from csrc/balance_serve/kvc2/mkfs.sh
rename to archive/csrc/balance_serve/kvc2/mkfs.sh
diff --git a/csrc/balance_serve/kvc2/src/CMakeLists.txt b/archive/csrc/balance_serve/kvc2/src/CMakeLists.txt
similarity index 100%
rename from csrc/balance_serve/kvc2/src/CMakeLists.txt
rename to archive/csrc/balance_serve/kvc2/src/CMakeLists.txt
diff --git a/csrc/balance_serve/kvc2/src/async_store.cpp b/archive/csrc/balance_serve/kvc2/src/async_store.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/src/async_store.cpp
rename to archive/csrc/balance_serve/kvc2/src/async_store.cpp
diff --git a/csrc/balance_serve/kvc2/src/async_store.hh b/archive/csrc/balance_serve/kvc2/src/async_store.hh
similarity index 100%
rename from csrc/balance_serve/kvc2/src/async_store.hh
rename to archive/csrc/balance_serve/kvc2/src/async_store.hh
diff --git a/csrc/balance_serve/kvc2/src/bind.cpp b/archive/csrc/balance_serve/kvc2/src/bind.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/src/bind.cpp
rename to archive/csrc/balance_serve/kvc2/src/bind.cpp
diff --git a/csrc/balance_serve/kvc2/src/cache_entry.cpp b/archive/csrc/balance_serve/kvc2/src/cache_entry.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/src/cache_entry.cpp
rename to archive/csrc/balance_serve/kvc2/src/cache_entry.cpp
diff --git a/csrc/balance_serve/kvc2/src/cache_entry.hh b/archive/csrc/balance_serve/kvc2/src/cache_entry.hh
similarity index 100%
rename from csrc/balance_serve/kvc2/src/cache_entry.hh
rename to archive/csrc/balance_serve/kvc2/src/cache_entry.hh
diff --git a/csrc/balance_serve/kvc2/src/common.h b/archive/csrc/balance_serve/kvc2/src/common.h
similarity index 100%
rename from csrc/balance_serve/kvc2/src/common.h
rename to archive/csrc/balance_serve/kvc2/src/common.h
diff --git a/csrc/balance_serve/kvc2/src/cuda_stream_manager.cpp b/archive/csrc/balance_serve/kvc2/src/cuda_stream_manager.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/src/cuda_stream_manager.cpp
rename to archive/csrc/balance_serve/kvc2/src/cuda_stream_manager.cpp
diff --git a/csrc/balance_serve/kvc2/src/cuda_stream_manager.hh b/archive/csrc/balance_serve/kvc2/src/cuda_stream_manager.hh
similarity index 100%
rename from csrc/balance_serve/kvc2/src/cuda_stream_manager.hh
rename to archive/csrc/balance_serve/kvc2/src/cuda_stream_manager.hh
diff --git a/csrc/balance_serve/kvc2/src/defs.h b/archive/csrc/balance_serve/kvc2/src/defs.h
similarity index 100%
rename from csrc/balance_serve/kvc2/src/defs.h
rename to archive/csrc/balance_serve/kvc2/src/defs.h
diff --git a/csrc/balance_serve/kvc2/src/gpu_cache.cpp b/archive/csrc/balance_serve/kvc2/src/gpu_cache.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/src/gpu_cache.cpp
rename to archive/csrc/balance_serve/kvc2/src/gpu_cache.cpp
diff --git a/csrc/balance_serve/kvc2/src/gpu_cache.hh b/archive/csrc/balance_serve/kvc2/src/gpu_cache.hh
similarity index 100%
rename from csrc/balance_serve/kvc2/src/gpu_cache.hh
rename to archive/csrc/balance_serve/kvc2/src/gpu_cache.hh
diff --git a/csrc/balance_serve/kvc2/src/hasher.hpp b/archive/csrc/balance_serve/kvc2/src/hasher.hpp
similarity index 100%
rename from csrc/balance_serve/kvc2/src/hasher.hpp
rename to archive/csrc/balance_serve/kvc2/src/hasher.hpp
diff --git a/csrc/balance_serve/kvc2/src/io_helper.hpp b/archive/csrc/balance_serve/kvc2/src/io_helper.hpp
similarity index 100%
rename from csrc/balance_serve/kvc2/src/io_helper.hpp
rename to archive/csrc/balance_serve/kvc2/src/io_helper.hpp
diff --git a/csrc/balance_serve/kvc2/src/kvc2.h b/archive/csrc/balance_serve/kvc2/src/kvc2.h
similarity index 100%
rename from csrc/balance_serve/kvc2/src/kvc2.h
rename to archive/csrc/balance_serve/kvc2/src/kvc2.h
diff --git a/csrc/balance_serve/kvc2/src/kvc2_utils.py b/archive/csrc/balance_serve/kvc2/src/kvc2_utils.py
similarity index 100%
rename from csrc/balance_serve/kvc2/src/kvc2_utils.py
rename to archive/csrc/balance_serve/kvc2/src/kvc2_utils.py
diff --git a/csrc/balance_serve/kvc2/src/metrics.cpp b/archive/csrc/balance_serve/kvc2/src/metrics.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/src/metrics.cpp
rename to archive/csrc/balance_serve/kvc2/src/metrics.cpp
diff --git a/csrc/balance_serve/kvc2/src/metrics.h b/archive/csrc/balance_serve/kvc2/src/metrics.h
similarity index 100%
rename from csrc/balance_serve/kvc2/src/metrics.h
rename to archive/csrc/balance_serve/kvc2/src/metrics.h
diff --git a/csrc/balance_serve/kvc2/src/model_config.h b/archive/csrc/balance_serve/kvc2/src/model_config.h
similarity index 100%
rename from csrc/balance_serve/kvc2/src/model_config.h
rename to archive/csrc/balance_serve/kvc2/src/model_config.h
diff --git a/csrc/balance_serve/kvc2/src/page_aligned_memory_pool.cpp b/archive/csrc/balance_serve/kvc2/src/page_aligned_memory_pool.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/src/page_aligned_memory_pool.cpp
rename to archive/csrc/balance_serve/kvc2/src/page_aligned_memory_pool.cpp
diff --git a/csrc/balance_serve/kvc2/src/page_aligned_memory_pool.h b/archive/csrc/balance_serve/kvc2/src/page_aligned_memory_pool.h
similarity index 100%
rename from csrc/balance_serve/kvc2/src/page_aligned_memory_pool.h
rename to archive/csrc/balance_serve/kvc2/src/page_aligned_memory_pool.h
diff --git a/csrc/balance_serve/kvc2/src/prefix.cpp b/archive/csrc/balance_serve/kvc2/src/prefix.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/src/prefix.cpp
rename to archive/csrc/balance_serve/kvc2/src/prefix.cpp
diff --git a/csrc/balance_serve/kvc2/src/utils/all.hpp b/archive/csrc/balance_serve/kvc2/src/utils/all.hpp
similarity index 100%
rename from csrc/balance_serve/kvc2/src/utils/all.hpp
rename to archive/csrc/balance_serve/kvc2/src/utils/all.hpp
diff --git a/csrc/balance_serve/kvc2/src/utils/arithmetic.hpp b/archive/csrc/balance_serve/kvc2/src/utils/arithmetic.hpp
similarity index 100%
rename from csrc/balance_serve/kvc2/src/utils/arithmetic.hpp
rename to archive/csrc/balance_serve/kvc2/src/utils/arithmetic.hpp
diff --git a/csrc/balance_serve/kvc2/src/utils/easy_format.hpp b/archive/csrc/balance_serve/kvc2/src/utils/easy_format.hpp
similarity index 100%
rename from csrc/balance_serve/kvc2/src/utils/easy_format.hpp
rename to archive/csrc/balance_serve/kvc2/src/utils/easy_format.hpp
diff --git a/csrc/balance_serve/kvc2/src/utils/lock_free_queue.hpp b/archive/csrc/balance_serve/kvc2/src/utils/lock_free_queue.hpp
similarity index 100%
rename from csrc/balance_serve/kvc2/src/utils/lock_free_queue.hpp
rename to archive/csrc/balance_serve/kvc2/src/utils/lock_free_queue.hpp
diff --git a/csrc/balance_serve/kvc2/src/utils/mpsc.hpp b/archive/csrc/balance_serve/kvc2/src/utils/mpsc.hpp
similarity index 100%
rename from csrc/balance_serve/kvc2/src/utils/mpsc.hpp
rename to archive/csrc/balance_serve/kvc2/src/utils/mpsc.hpp
diff --git a/csrc/balance_serve/kvc2/src/utils/mutex_extend.hpp b/archive/csrc/balance_serve/kvc2/src/utils/mutex_extend.hpp
similarity index 100%
rename from csrc/balance_serve/kvc2/src/utils/mutex_extend.hpp
rename to archive/csrc/balance_serve/kvc2/src/utils/mutex_extend.hpp
diff --git a/csrc/balance_serve/kvc2/src/utils/periodic_task.hpp b/archive/csrc/balance_serve/kvc2/src/utils/periodic_task.hpp
similarity index 100%
rename from csrc/balance_serve/kvc2/src/utils/periodic_task.hpp
rename to archive/csrc/balance_serve/kvc2/src/utils/periodic_task.hpp
diff --git a/csrc/balance_serve/kvc2/src/utils/spin_lock.hpp b/archive/csrc/balance_serve/kvc2/src/utils/spin_lock.hpp
similarity index 100%
rename from csrc/balance_serve/kvc2/src/utils/spin_lock.hpp
rename to archive/csrc/balance_serve/kvc2/src/utils/spin_lock.hpp
diff --git a/csrc/balance_serve/kvc2/src/utils/timer.hpp b/archive/csrc/balance_serve/kvc2/src/utils/timer.hpp
similarity index 100%
rename from csrc/balance_serve/kvc2/src/utils/timer.hpp
rename to archive/csrc/balance_serve/kvc2/src/utils/timer.hpp
diff --git a/csrc/balance_serve/kvc2/test/CMakeLists.txt b/archive/csrc/balance_serve/kvc2/test/CMakeLists.txt
similarity index 100%
rename from csrc/balance_serve/kvc2/test/CMakeLists.txt
rename to archive/csrc/balance_serve/kvc2/test/CMakeLists.txt
diff --git a/csrc/balance_serve/kvc2/test/hashmap_test.cpp b/archive/csrc/balance_serve/kvc2/test/hashmap_test.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/hashmap_test.cpp
rename to archive/csrc/balance_serve/kvc2/test/hashmap_test.cpp
diff --git a/csrc/balance_serve/kvc2/test/kvc2_export_header_test.cpp b/archive/csrc/balance_serve/kvc2/test/kvc2_export_header_test.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvc2_export_header_test.cpp
rename to archive/csrc/balance_serve/kvc2/test/kvc2_export_header_test.cpp
diff --git a/csrc/balance_serve/kvc2/test/kvc2_export_load_test.cpp b/archive/csrc/balance_serve/kvc2/test/kvc2_export_load_test.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvc2_export_load_test.cpp
rename to archive/csrc/balance_serve/kvc2/test/kvc2_export_load_test.cpp
diff --git a/csrc/balance_serve/kvc2/test/kvc2_test_utils.cpp b/archive/csrc/balance_serve/kvc2/test/kvc2_test_utils.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvc2_test_utils.cpp
rename to archive/csrc/balance_serve/kvc2/test/kvc2_test_utils.cpp
diff --git a/csrc/balance_serve/kvc2/test/kvc2test/CMakeLists.txt b/archive/csrc/balance_serve/kvc2/test/kvc2test/CMakeLists.txt
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvc2test/CMakeLists.txt
rename to archive/csrc/balance_serve/kvc2/test/kvc2test/CMakeLists.txt
diff --git a/csrc/balance_serve/kvc2/test/kvc2test/append-tokens.cpp b/archive/csrc/balance_serve/kvc2/test/kvc2test/append-tokens.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvc2test/append-tokens.cpp
rename to archive/csrc/balance_serve/kvc2/test/kvc2test/append-tokens.cpp
diff --git a/csrc/balance_serve/kvc2/test/kvc2test/check-flush-back.cpp b/archive/csrc/balance_serve/kvc2/test/kvc2test/check-flush-back.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvc2test/check-flush-back.cpp
rename to archive/csrc/balance_serve/kvc2/test/kvc2test/check-flush-back.cpp
diff --git a/csrc/balance_serve/kvc2/test/kvc2test/common.hpp b/archive/csrc/balance_serve/kvc2/test/kvc2test/common.hpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvc2test/common.hpp
rename to archive/csrc/balance_serve/kvc2/test/kvc2test/common.hpp
diff --git a/csrc/balance_serve/kvc2/test/kvc2test/flush-back.cpp b/archive/csrc/balance_serve/kvc2/test/kvc2test/flush-back.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvc2test/flush-back.cpp
rename to archive/csrc/balance_serve/kvc2/test/kvc2test/flush-back.cpp
diff --git a/csrc/balance_serve/kvc2/test/kvc2test/lookup-alt-gpu.cpp b/archive/csrc/balance_serve/kvc2/test/kvc2test/lookup-alt-gpu.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvc2test/lookup-alt-gpu.cpp
rename to archive/csrc/balance_serve/kvc2/test/kvc2test/lookup-alt-gpu.cpp
diff --git a/csrc/balance_serve/kvc2/test/kvc2test/lookup-alt.cpp b/archive/csrc/balance_serve/kvc2/test/kvc2test/lookup-alt.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvc2test/lookup-alt.cpp
rename to archive/csrc/balance_serve/kvc2/test/kvc2test/lookup-alt.cpp
diff --git a/csrc/balance_serve/kvc2/test/kvc2test/lookup-gpu-async.cpp b/archive/csrc/balance_serve/kvc2/test/kvc2test/lookup-gpu-async.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvc2test/lookup-gpu-async.cpp
rename to archive/csrc/balance_serve/kvc2/test/kvc2test/lookup-gpu-async.cpp
diff --git a/csrc/balance_serve/kvc2/test/kvc2test/lookup-gpu-mt-without-vcache.cpp b/archive/csrc/balance_serve/kvc2/test/kvc2test/lookup-gpu-mt-without-vcache.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvc2test/lookup-gpu-mt-without-vcache.cpp
rename to archive/csrc/balance_serve/kvc2/test/kvc2test/lookup-gpu-mt-without-vcache.cpp
diff --git a/csrc/balance_serve/kvc2/test/kvc2test/lookup-gpu-mt.cpp b/archive/csrc/balance_serve/kvc2/test/kvc2test/lookup-gpu-mt.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvc2test/lookup-gpu-mt.cpp
rename to archive/csrc/balance_serve/kvc2/test/kvc2test/lookup-gpu-mt.cpp
diff --git a/csrc/balance_serve/kvc2/test/kvc2test/lookup-gpu.cpp b/archive/csrc/balance_serve/kvc2/test/kvc2test/lookup-gpu.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvc2test/lookup-gpu.cpp
rename to archive/csrc/balance_serve/kvc2/test/kvc2test/lookup-gpu.cpp
diff --git a/csrc/balance_serve/kvc2/test/kvc2test/lookup-mt.cpp b/archive/csrc/balance_serve/kvc2/test/kvc2test/lookup-mt.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvc2test/lookup-mt.cpp
rename to archive/csrc/balance_serve/kvc2/test/kvc2test/lookup-mt.cpp
diff --git a/csrc/balance_serve/kvc2/test/kvc2test/lookup-without-vcache.cpp b/archive/csrc/balance_serve/kvc2/test/kvc2test/lookup-without-vcache.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvc2test/lookup-without-vcache.cpp
rename to archive/csrc/balance_serve/kvc2/test/kvc2test/lookup-without-vcache.cpp
diff --git a/csrc/balance_serve/kvc2/test/kvc2test/lookup.cpp b/archive/csrc/balance_serve/kvc2/test/kvc2test/lookup.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvc2test/lookup.cpp
rename to archive/csrc/balance_serve/kvc2/test/kvc2test/lookup.cpp
diff --git a/csrc/balance_serve/kvc2/test/kvc2test/raw_insert_read.cpp b/archive/csrc/balance_serve/kvc2/test/kvc2test/raw_insert_read.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvc2test/raw_insert_read.cpp
rename to archive/csrc/balance_serve/kvc2/test/kvc2test/raw_insert_read.cpp
diff --git a/csrc/balance_serve/kvc2/test/kvcache_disk_insert_read_test.cpp b/archive/csrc/balance_serve/kvc2/test/kvcache_disk_insert_read_test.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvcache_disk_insert_read_test.cpp
rename to archive/csrc/balance_serve/kvc2/test/kvcache_disk_insert_read_test.cpp
diff --git a/csrc/balance_serve/kvc2/test/kvcache_mem_eviction_test.cpp b/archive/csrc/balance_serve/kvc2/test/kvcache_mem_eviction_test.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvcache_mem_eviction_test.cpp
rename to archive/csrc/balance_serve/kvc2/test/kvcache_mem_eviction_test.cpp
diff --git a/csrc/balance_serve/kvc2/test/kvcache_mem_insert_read_test.cpp b/archive/csrc/balance_serve/kvc2/test/kvcache_mem_insert_read_test.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvcache_mem_insert_read_test.cpp
rename to archive/csrc/balance_serve/kvc2/test/kvcache_mem_insert_read_test.cpp
diff --git a/csrc/balance_serve/kvc2/test/kvcache_save_load_test.cpp b/archive/csrc/balance_serve/kvc2/test/kvcache_save_load_test.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvcache_save_load_test.cpp
rename to archive/csrc/balance_serve/kvc2/test/kvcache_save_load_test.cpp
diff --git a/csrc/balance_serve/kvc2/test/kvcache_test_utils.cpp b/archive/csrc/balance_serve/kvc2/test/kvcache_test_utils.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/kvcache_test_utils.cpp
rename to archive/csrc/balance_serve/kvc2/test/kvcache_test_utils.cpp
diff --git a/csrc/balance_serve/kvc2/test/page_pool_test.cpp b/archive/csrc/balance_serve/kvc2/test/page_pool_test.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/page_pool_test.cpp
rename to archive/csrc/balance_serve/kvc2/test/page_pool_test.cpp
diff --git a/csrc/balance_serve/kvc2/test/prefix_test.cpp b/archive/csrc/balance_serve/kvc2/test/prefix_test.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/prefix_test.cpp
rename to archive/csrc/balance_serve/kvc2/test/prefix_test.cpp
diff --git a/csrc/balance_serve/kvc2/test/pytest_load.py b/archive/csrc/balance_serve/kvc2/test/pytest_load.py
similarity index 100%
rename from csrc/balance_serve/kvc2/test/pytest_load.py
rename to archive/csrc/balance_serve/kvc2/test/pytest_load.py
diff --git a/csrc/balance_serve/kvc2/test/pytest_mem_prefix_test.py b/archive/csrc/balance_serve/kvc2/test/pytest_mem_prefix_test.py
similarity index 100%
rename from csrc/balance_serve/kvc2/test/pytest_mem_prefix_test.py
rename to archive/csrc/balance_serve/kvc2/test/pytest_mem_prefix_test.py
diff --git a/csrc/balance_serve/kvc2/test/pytest_mem_read.py b/archive/csrc/balance_serve/kvc2/test/pytest_mem_read.py
similarity index 100%
rename from csrc/balance_serve/kvc2/test/pytest_mem_read.py
rename to archive/csrc/balance_serve/kvc2/test/pytest_mem_read.py
diff --git a/csrc/balance_serve/kvc2/test/pytest_raw_insert_and_read.py b/archive/csrc/balance_serve/kvc2/test/pytest_raw_insert_and_read.py
similarity index 100%
rename from csrc/balance_serve/kvc2/test/pytest_raw_insert_and_read.py
rename to archive/csrc/balance_serve/kvc2/test/pytest_raw_insert_and_read.py
diff --git a/csrc/balance_serve/kvc2/test/test_align.py b/archive/csrc/balance_serve/kvc2/test/test_align.py
similarity index 100%
rename from csrc/balance_serve/kvc2/test/test_align.py
rename to archive/csrc/balance_serve/kvc2/test/test_align.py
diff --git a/csrc/balance_serve/kvc2/test/test_cuda_stream.cpp b/archive/csrc/balance_serve/kvc2/test/test_cuda_stream.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/test_cuda_stream.cpp
rename to archive/csrc/balance_serve/kvc2/test/test_cuda_stream.cpp
diff --git a/csrc/balance_serve/kvc2/test/test_cuda_stream_manager.cpp b/archive/csrc/balance_serve/kvc2/test/test_cuda_stream_manager.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/test_cuda_stream_manager.cpp
rename to archive/csrc/balance_serve/kvc2/test/test_cuda_stream_manager.cpp
diff --git a/csrc/balance_serve/kvc2/test/test_lock_free_queue.cpp b/archive/csrc/balance_serve/kvc2/test/test_lock_free_queue.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/test_lock_free_queue.cpp
rename to archive/csrc/balance_serve/kvc2/test/test_lock_free_queue.cpp
diff --git a/csrc/balance_serve/kvc2/test/test_periodic_task.cpp b/archive/csrc/balance_serve/kvc2/test/test_periodic_task.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/test_periodic_task.cpp
rename to archive/csrc/balance_serve/kvc2/test/test_periodic_task.cpp
diff --git a/csrc/balance_serve/kvc2/test/test_queue_perf.cpp b/archive/csrc/balance_serve/kvc2/test/test_queue_perf.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/test_queue_perf.cpp
rename to archive/csrc/balance_serve/kvc2/test/test_queue_perf.cpp
diff --git a/csrc/balance_serve/kvc2/test/test_std_list.cpp b/archive/csrc/balance_serve/kvc2/test/test_std_list.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/test_std_list.cpp
rename to archive/csrc/balance_serve/kvc2/test/test_std_list.cpp
diff --git a/csrc/balance_serve/kvc2/test/xxHash_test.cpp b/archive/csrc/balance_serve/kvc2/test/xxHash_test.cpp
similarity index 100%
rename from csrc/balance_serve/kvc2/test/xxHash_test.cpp
rename to archive/csrc/balance_serve/kvc2/test/xxHash_test.cpp
diff --git a/csrc/balance_serve/kvc2/unit_test.sh b/archive/csrc/balance_serve/kvc2/unit_test.sh
similarity index 100%
rename from csrc/balance_serve/kvc2/unit_test.sh
rename to archive/csrc/balance_serve/kvc2/unit_test.sh
diff --git a/csrc/balance_serve/sched/CMakeLists.txt b/archive/csrc/balance_serve/sched/CMakeLists.txt
similarity index 100%
rename from csrc/balance_serve/sched/CMakeLists.txt
rename to archive/csrc/balance_serve/sched/CMakeLists.txt
diff --git a/csrc/balance_serve/sched/bind.cpp b/archive/csrc/balance_serve/sched/bind.cpp
similarity index 100%
rename from csrc/balance_serve/sched/bind.cpp
rename to archive/csrc/balance_serve/sched/bind.cpp
diff --git a/csrc/balance_serve/sched/metrics.cpp b/archive/csrc/balance_serve/sched/metrics.cpp
similarity index 100%
rename from csrc/balance_serve/sched/metrics.cpp
rename to archive/csrc/balance_serve/sched/metrics.cpp
diff --git a/csrc/balance_serve/sched/metrics.h b/archive/csrc/balance_serve/sched/metrics.h
similarity index 100%
rename from csrc/balance_serve/sched/metrics.h
rename to archive/csrc/balance_serve/sched/metrics.h
diff --git a/csrc/balance_serve/sched/model_config.h b/archive/csrc/balance_serve/sched/model_config.h
similarity index 100%
rename from csrc/balance_serve/sched/model_config.h
rename to archive/csrc/balance_serve/sched/model_config.h
diff --git a/csrc/balance_serve/sched/scheduler.cpp b/archive/csrc/balance_serve/sched/scheduler.cpp
similarity index 100%
rename from csrc/balance_serve/sched/scheduler.cpp
rename to archive/csrc/balance_serve/sched/scheduler.cpp
diff --git a/csrc/balance_serve/sched/scheduler.h b/archive/csrc/balance_serve/sched/scheduler.h
similarity index 100%
rename from csrc/balance_serve/sched/scheduler.h
rename to archive/csrc/balance_serve/sched/scheduler.h
diff --git a/csrc/balance_serve/sched/utils/all.hpp b/archive/csrc/balance_serve/sched/utils/all.hpp
similarity index 100%
rename from csrc/balance_serve/sched/utils/all.hpp
rename to archive/csrc/balance_serve/sched/utils/all.hpp
diff --git a/csrc/balance_serve/sched/utils/arithmetic.hpp b/archive/csrc/balance_serve/sched/utils/arithmetic.hpp
similarity index 100%
rename from csrc/balance_serve/sched/utils/arithmetic.hpp
rename to archive/csrc/balance_serve/sched/utils/arithmetic.hpp
diff --git a/csrc/balance_serve/sched/utils/atomic_ptr_with_flags.hpp b/archive/csrc/balance_serve/sched/utils/atomic_ptr_with_flags.hpp
similarity index 100%
rename from csrc/balance_serve/sched/utils/atomic_ptr_with_flags.hpp
rename to archive/csrc/balance_serve/sched/utils/atomic_ptr_with_flags.hpp
diff --git a/csrc/balance_serve/sched/utils/csv.hpp b/archive/csrc/balance_serve/sched/utils/csv.hpp
similarity index 100%
rename from csrc/balance_serve/sched/utils/csv.hpp
rename to archive/csrc/balance_serve/sched/utils/csv.hpp
diff --git a/csrc/balance_serve/sched/utils/easy_format.hpp b/archive/csrc/balance_serve/sched/utils/easy_format.hpp
similarity index 100%
rename from csrc/balance_serve/sched/utils/easy_format.hpp
rename to archive/csrc/balance_serve/sched/utils/easy_format.hpp
diff --git a/csrc/balance_serve/sched/utils/mpsc.hpp b/archive/csrc/balance_serve/sched/utils/mpsc.hpp
similarity index 100%
rename from csrc/balance_serve/sched/utils/mpsc.hpp
rename to archive/csrc/balance_serve/sched/utils/mpsc.hpp
diff --git a/csrc/balance_serve/sched/utils/readable_number.hpp b/archive/csrc/balance_serve/sched/utils/readable_number.hpp
similarity index 100%
rename from csrc/balance_serve/sched/utils/readable_number.hpp
rename to archive/csrc/balance_serve/sched/utils/readable_number.hpp
diff --git a/csrc/balance_serve/sched/utils/statistics.hpp b/archive/csrc/balance_serve/sched/utils/statistics.hpp
similarity index 100%
rename from csrc/balance_serve/sched/utils/statistics.hpp
rename to archive/csrc/balance_serve/sched/utils/statistics.hpp
diff --git a/csrc/balance_serve/sched/utils/timer.hpp b/archive/csrc/balance_serve/sched/utils/timer.hpp
similarity index 100%
rename from csrc/balance_serve/sched/utils/timer.hpp
rename to archive/csrc/balance_serve/sched/utils/timer.hpp
diff --git a/csrc/custom_marlin/__init__.py b/archive/csrc/custom_marlin/__init__.py
similarity index 100%
rename from csrc/custom_marlin/__init__.py
rename to archive/csrc/custom_marlin/__init__.py
diff --git a/csrc/custom_marlin/binding.cpp b/archive/csrc/custom_marlin/binding.cpp
similarity index 100%
rename from csrc/custom_marlin/binding.cpp
rename to archive/csrc/custom_marlin/binding.cpp
diff --git a/csrc/custom_marlin/gptq_marlin/gptq_marlin.cu b/archive/csrc/custom_marlin/gptq_marlin/gptq_marlin.cu
similarity index 100%
rename from csrc/custom_marlin/gptq_marlin/gptq_marlin.cu
rename to archive/csrc/custom_marlin/gptq_marlin/gptq_marlin.cu
diff --git a/csrc/custom_marlin/gptq_marlin/gptq_marlin.cuh b/archive/csrc/custom_marlin/gptq_marlin/gptq_marlin.cuh
similarity index 100%
rename from csrc/custom_marlin/gptq_marlin/gptq_marlin.cuh
rename to archive/csrc/custom_marlin/gptq_marlin/gptq_marlin.cuh
diff --git a/csrc/custom_marlin/gptq_marlin/gptq_marlin_dtypes.cuh b/archive/csrc/custom_marlin/gptq_marlin/gptq_marlin_dtypes.cuh
similarity index 100%
rename from csrc/custom_marlin/gptq_marlin/gptq_marlin_dtypes.cuh
rename to archive/csrc/custom_marlin/gptq_marlin/gptq_marlin_dtypes.cuh
diff --git a/csrc/custom_marlin/gptq_marlin/gptq_marlin_repack.cu b/archive/csrc/custom_marlin/gptq_marlin/gptq_marlin_repack.cu
similarity index 100%
rename from csrc/custom_marlin/gptq_marlin/gptq_marlin_repack.cu
rename to archive/csrc/custom_marlin/gptq_marlin/gptq_marlin_repack.cu
diff --git a/csrc/custom_marlin/gptq_marlin/ops.h b/archive/csrc/custom_marlin/gptq_marlin/ops.h
similarity index 100%
rename from csrc/custom_marlin/gptq_marlin/ops.h
rename to archive/csrc/custom_marlin/gptq_marlin/ops.h
diff --git a/csrc/custom_marlin/setup.py b/archive/csrc/custom_marlin/setup.py
similarity index 100%
rename from csrc/custom_marlin/setup.py
rename to archive/csrc/custom_marlin/setup.py
diff --git a/csrc/custom_marlin/test_cuda_graph.py b/archive/csrc/custom_marlin/test_cuda_graph.py
similarity index 100%
rename from csrc/custom_marlin/test_cuda_graph.py
rename to archive/csrc/custom_marlin/test_cuda_graph.py
diff --git a/csrc/custom_marlin/utils/__init__.py b/archive/csrc/custom_marlin/utils/__init__.py
similarity index 100%
rename from csrc/custom_marlin/utils/__init__.py
rename to archive/csrc/custom_marlin/utils/__init__.py
diff --git a/csrc/custom_marlin/utils/format24.py b/archive/csrc/custom_marlin/utils/format24.py
similarity index 100%
rename from csrc/custom_marlin/utils/format24.py
rename to archive/csrc/custom_marlin/utils/format24.py
diff --git a/csrc/custom_marlin/utils/marlin_24_perms.py b/archive/csrc/custom_marlin/utils/marlin_24_perms.py
similarity index 100%
rename from csrc/custom_marlin/utils/marlin_24_perms.py
rename to archive/csrc/custom_marlin/utils/marlin_24_perms.py
diff --git a/csrc/custom_marlin/utils/marlin_perms.py b/archive/csrc/custom_marlin/utils/marlin_perms.py
similarity index 100%
rename from csrc/custom_marlin/utils/marlin_perms.py
rename to archive/csrc/custom_marlin/utils/marlin_perms.py
diff --git a/csrc/custom_marlin/utils/marlin_utils.py b/archive/csrc/custom_marlin/utils/marlin_utils.py
similarity index 100%
rename from csrc/custom_marlin/utils/marlin_utils.py
rename to archive/csrc/custom_marlin/utils/marlin_utils.py
diff --git a/csrc/custom_marlin/utils/quant_utils.py b/archive/csrc/custom_marlin/utils/quant_utils.py
similarity index 100%
rename from csrc/custom_marlin/utils/quant_utils.py
rename to archive/csrc/custom_marlin/utils/quant_utils.py
diff --git a/csrc/ktransformers_ext/CMakeLists.txt b/archive/csrc/ktransformers_ext/CMakeLists.txt
similarity index 100%
rename from csrc/ktransformers_ext/CMakeLists.txt
rename to archive/csrc/ktransformers_ext/CMakeLists.txt
diff --git a/csrc/ktransformers_ext/bench/bench_attention.py b/archive/csrc/ktransformers_ext/bench/bench_attention.py
similarity index 100%
rename from csrc/ktransformers_ext/bench/bench_attention.py
rename to archive/csrc/ktransformers_ext/bench/bench_attention.py
diff --git a/csrc/ktransformers_ext/bench/bench_attention_torch.py b/archive/csrc/ktransformers_ext/bench/bench_attention_torch.py
similarity index 100%
rename from csrc/ktransformers_ext/bench/bench_attention_torch.py
rename to archive/csrc/ktransformers_ext/bench/bench_attention_torch.py
diff --git a/csrc/ktransformers_ext/bench/bench_linear.py b/archive/csrc/ktransformers_ext/bench/bench_linear.py
similarity index 100%
rename from csrc/ktransformers_ext/bench/bench_linear.py
rename to archive/csrc/ktransformers_ext/bench/bench_linear.py
diff --git a/csrc/ktransformers_ext/bench/bench_linear_torch.py b/archive/csrc/ktransformers_ext/bench/bench_linear_torch.py
similarity index 100%
rename from csrc/ktransformers_ext/bench/bench_linear_torch.py
rename to archive/csrc/ktransformers_ext/bench/bench_linear_torch.py
diff --git a/csrc/ktransformers_ext/bench/bench_mlp.py b/archive/csrc/ktransformers_ext/bench/bench_mlp.py
similarity index 100%
rename from csrc/ktransformers_ext/bench/bench_mlp.py
rename to archive/csrc/ktransformers_ext/bench/bench_mlp.py
diff --git a/csrc/ktransformers_ext/bench/bench_mlp_torch.py b/archive/csrc/ktransformers_ext/bench/bench_mlp_torch.py
similarity index 100%
rename from csrc/ktransformers_ext/bench/bench_mlp_torch.py
rename to archive/csrc/ktransformers_ext/bench/bench_mlp_torch.py
diff --git a/csrc/ktransformers_ext/bench/bench_moe.py b/archive/csrc/ktransformers_ext/bench/bench_moe.py
similarity index 100%
rename from csrc/ktransformers_ext/bench/bench_moe.py
rename to archive/csrc/ktransformers_ext/bench/bench_moe.py
diff --git a/csrc/ktransformers_ext/bench/bench_moe_amx.py b/archive/csrc/ktransformers_ext/bench/bench_moe_amx.py
similarity index 100%
rename from csrc/ktransformers_ext/bench/bench_moe_amx.py
rename to archive/csrc/ktransformers_ext/bench/bench_moe_amx.py
diff --git a/csrc/ktransformers_ext/bench/bench_moe_torch.py b/archive/csrc/ktransformers_ext/bench/bench_moe_torch.py
similarity index 100%
rename from csrc/ktransformers_ext/bench/bench_moe_torch.py
rename to archive/csrc/ktransformers_ext/bench/bench_moe_torch.py
diff --git a/csrc/ktransformers_ext/cmake/FindSIMD.cmake b/archive/csrc/ktransformers_ext/cmake/FindSIMD.cmake
similarity index 100%
rename from csrc/ktransformers_ext/cmake/FindSIMD.cmake
rename to archive/csrc/ktransformers_ext/cmake/FindSIMD.cmake
diff --git a/csrc/ktransformers_ext/cpu_backend/backend.cpp b/archive/csrc/ktransformers_ext/cpu_backend/backend.cpp
similarity index 100%
rename from csrc/ktransformers_ext/cpu_backend/backend.cpp
rename to archive/csrc/ktransformers_ext/cpu_backend/backend.cpp
diff --git a/csrc/ktransformers_ext/cpu_backend/backend.h b/archive/csrc/ktransformers_ext/cpu_backend/backend.h
similarity index 100%
rename from csrc/ktransformers_ext/cpu_backend/backend.h
rename to archive/csrc/ktransformers_ext/cpu_backend/backend.h
diff --git a/csrc/ktransformers_ext/cpu_backend/cpuinfer.h b/archive/csrc/ktransformers_ext/cpu_backend/cpuinfer.h
similarity index 100%
rename from csrc/ktransformers_ext/cpu_backend/cpuinfer.h
rename to archive/csrc/ktransformers_ext/cpu_backend/cpuinfer.h
diff --git a/csrc/ktransformers_ext/cpu_backend/shared_mem_buffer.cpp b/archive/csrc/ktransformers_ext/cpu_backend/shared_mem_buffer.cpp
similarity index 100%
rename from csrc/ktransformers_ext/cpu_backend/shared_mem_buffer.cpp
rename to archive/csrc/ktransformers_ext/cpu_backend/shared_mem_buffer.cpp
diff --git a/csrc/ktransformers_ext/cpu_backend/shared_mem_buffer.h b/archive/csrc/ktransformers_ext/cpu_backend/shared_mem_buffer.h
similarity index 100%
rename from csrc/ktransformers_ext/cpu_backend/shared_mem_buffer.h
rename to archive/csrc/ktransformers_ext/cpu_backend/shared_mem_buffer.h
diff --git a/csrc/ktransformers_ext/cpu_backend/task_queue.cpp b/archive/csrc/ktransformers_ext/cpu_backend/task_queue.cpp
similarity index 100%
rename from csrc/ktransformers_ext/cpu_backend/task_queue.cpp
rename to archive/csrc/ktransformers_ext/cpu_backend/task_queue.cpp
diff --git a/csrc/ktransformers_ext/cpu_backend/task_queue.h b/archive/csrc/ktransformers_ext/cpu_backend/task_queue.h
similarity index 100%
rename from csrc/ktransformers_ext/cpu_backend/task_queue.h
rename to archive/csrc/ktransformers_ext/cpu_backend/task_queue.h
diff --git a/csrc/ktransformers_ext/cpu_backend/vendors/README.md b/archive/csrc/ktransformers_ext/cpu_backend/vendors/README.md
similarity index 100%
rename from csrc/ktransformers_ext/cpu_backend/vendors/README.md
rename to archive/csrc/ktransformers_ext/cpu_backend/vendors/README.md
diff --git a/csrc/ktransformers_ext/cpu_backend/vendors/cuda.h b/archive/csrc/ktransformers_ext/cpu_backend/vendors/cuda.h
similarity index 100%
rename from csrc/ktransformers_ext/cpu_backend/vendors/cuda.h
rename to archive/csrc/ktransformers_ext/cpu_backend/vendors/cuda.h
diff --git a/csrc/ktransformers_ext/cpu_backend/vendors/hip.h b/archive/csrc/ktransformers_ext/cpu_backend/vendors/hip.h
similarity index 100%
rename from csrc/ktransformers_ext/cpu_backend/vendors/hip.h
rename to archive/csrc/ktransformers_ext/cpu_backend/vendors/hip.h
diff --git a/csrc/ktransformers_ext/cpu_backend/vendors/musa.h b/archive/csrc/ktransformers_ext/cpu_backend/vendors/musa.h
similarity index 100%
rename from csrc/ktransformers_ext/cpu_backend/vendors/musa.h
rename to archive/csrc/ktransformers_ext/cpu_backend/vendors/musa.h
diff --git a/csrc/ktransformers_ext/cpu_backend/vendors/vendor.h b/archive/csrc/ktransformers_ext/cpu_backend/vendors/vendor.h
similarity index 100%
rename from csrc/ktransformers_ext/cpu_backend/vendors/vendor.h
rename to archive/csrc/ktransformers_ext/cpu_backend/vendors/vendor.h
diff --git a/csrc/ktransformers_ext/cuda/binding.cpp b/archive/csrc/ktransformers_ext/cuda/binding.cpp
similarity index 100%
rename from csrc/ktransformers_ext/cuda/binding.cpp
rename to archive/csrc/ktransformers_ext/cuda/binding.cpp
diff --git a/csrc/ktransformers_ext/cuda/custom_gguf/dequant.cu b/archive/csrc/ktransformers_ext/cuda/custom_gguf/dequant.cu
similarity index 100%
rename from csrc/ktransformers_ext/cuda/custom_gguf/dequant.cu
rename to archive/csrc/ktransformers_ext/cuda/custom_gguf/dequant.cu
diff --git a/csrc/ktransformers_ext/cuda/custom_gguf/ops.h b/archive/csrc/ktransformers_ext/cuda/custom_gguf/ops.h
similarity index 100%
rename from csrc/ktransformers_ext/cuda/custom_gguf/ops.h
rename to archive/csrc/ktransformers_ext/cuda/custom_gguf/ops.h
diff --git a/csrc/ktransformers_ext/cuda/gptq_marlin/gptq_marlin.cu b/archive/csrc/ktransformers_ext/cuda/gptq_marlin/gptq_marlin.cu
similarity index 100%
rename from csrc/ktransformers_ext/cuda/gptq_marlin/gptq_marlin.cu
rename to archive/csrc/ktransformers_ext/cuda/gptq_marlin/gptq_marlin.cu
diff --git a/csrc/ktransformers_ext/cuda/gptq_marlin/gptq_marlin.cuh b/archive/csrc/ktransformers_ext/cuda/gptq_marlin/gptq_marlin.cuh
similarity index 100%
rename from csrc/ktransformers_ext/cuda/gptq_marlin/gptq_marlin.cuh
rename to archive/csrc/ktransformers_ext/cuda/gptq_marlin/gptq_marlin.cuh
diff --git a/csrc/ktransformers_ext/cuda/gptq_marlin/gptq_marlin_dtypes.cuh b/archive/csrc/ktransformers_ext/cuda/gptq_marlin/gptq_marlin_dtypes.cuh
similarity index 100%
rename from csrc/ktransformers_ext/cuda/gptq_marlin/gptq_marlin_dtypes.cuh
rename to archive/csrc/ktransformers_ext/cuda/gptq_marlin/gptq_marlin_dtypes.cuh
diff --git a/csrc/ktransformers_ext/cuda/gptq_marlin/ops.h b/archive/csrc/ktransformers_ext/cuda/gptq_marlin/ops.h
similarity index 100%
rename from csrc/ktransformers_ext/cuda/gptq_marlin/ops.h
rename to archive/csrc/ktransformers_ext/cuda/gptq_marlin/ops.h
diff --git a/csrc/ktransformers_ext/cuda/setup.py b/archive/csrc/ktransformers_ext/cuda/setup.py
similarity index 100%
rename from csrc/ktransformers_ext/cuda/setup.py
rename to archive/csrc/ktransformers_ext/cuda/setup.py
diff --git a/csrc/ktransformers_ext/cuda/test_dequant.py b/archive/csrc/ktransformers_ext/cuda/test_dequant.py
similarity index 100%
rename from csrc/ktransformers_ext/cuda/test_dequant.py
rename to archive/csrc/ktransformers_ext/cuda/test_dequant.py
diff --git a/csrc/ktransformers_ext/examples/test_attention.py b/archive/csrc/ktransformers_ext/examples/test_attention.py
similarity index 100%
rename from csrc/ktransformers_ext/examples/test_attention.py
rename to archive/csrc/ktransformers_ext/examples/test_attention.py
diff --git a/csrc/ktransformers_ext/examples/test_linear.py b/archive/csrc/ktransformers_ext/examples/test_linear.py
similarity index 100%
rename from csrc/ktransformers_ext/examples/test_linear.py
rename to archive/csrc/ktransformers_ext/examples/test_linear.py
diff --git a/csrc/ktransformers_ext/examples/test_mlp.py b/archive/csrc/ktransformers_ext/examples/test_mlp.py
similarity index 100%
rename from csrc/ktransformers_ext/examples/test_mlp.py
rename to archive/csrc/ktransformers_ext/examples/test_mlp.py
diff --git a/csrc/ktransformers_ext/examples/test_moe.py b/archive/csrc/ktransformers_ext/examples/test_moe.py
similarity index 100%
rename from csrc/ktransformers_ext/examples/test_moe.py
rename to archive/csrc/ktransformers_ext/examples/test_moe.py
diff --git a/csrc/ktransformers_ext/ext_bindings.cpp b/archive/csrc/ktransformers_ext/ext_bindings.cpp
similarity index 100%
rename from csrc/ktransformers_ext/ext_bindings.cpp
rename to archive/csrc/ktransformers_ext/ext_bindings.cpp
diff --git a/csrc/ktransformers_ext/operators/amx/la/amx.hpp b/archive/csrc/ktransformers_ext/operators/amx/la/amx.hpp
similarity index 100%
rename from csrc/ktransformers_ext/operators/amx/la/amx.hpp
rename to archive/csrc/ktransformers_ext/operators/amx/la/amx.hpp
diff --git a/csrc/ktransformers_ext/operators/amx/la/utils.hpp b/archive/csrc/ktransformers_ext/operators/amx/la/utils.hpp
similarity index 100%
rename from csrc/ktransformers_ext/operators/amx/la/utils.hpp
rename to archive/csrc/ktransformers_ext/operators/amx/la/utils.hpp
diff --git a/csrc/ktransformers_ext/operators/amx/moe.hpp b/archive/csrc/ktransformers_ext/operators/amx/moe.hpp
similarity index 100%
rename from csrc/ktransformers_ext/operators/amx/moe.hpp
rename to archive/csrc/ktransformers_ext/operators/amx/moe.hpp
diff --git a/csrc/ktransformers_ext/operators/kvcache/kvcache.h b/archive/csrc/ktransformers_ext/operators/kvcache/kvcache.h
similarity index 100%
rename from csrc/ktransformers_ext/operators/kvcache/kvcache.h
rename to archive/csrc/ktransformers_ext/operators/kvcache/kvcache.h
diff --git a/csrc/ktransformers_ext/operators/kvcache/kvcache_attn.cpp b/archive/csrc/ktransformers_ext/operators/kvcache/kvcache_attn.cpp
similarity index 100%
rename from csrc/ktransformers_ext/operators/kvcache/kvcache_attn.cpp
rename to archive/csrc/ktransformers_ext/operators/kvcache/kvcache_attn.cpp
diff --git a/csrc/ktransformers_ext/operators/kvcache/kvcache_load_dump.cpp b/archive/csrc/ktransformers_ext/operators/kvcache/kvcache_load_dump.cpp
similarity index 100%
rename from csrc/ktransformers_ext/operators/kvcache/kvcache_load_dump.cpp
rename to archive/csrc/ktransformers_ext/operators/kvcache/kvcache_load_dump.cpp
diff --git a/csrc/ktransformers_ext/operators/kvcache/kvcache_read_write.cpp b/archive/csrc/ktransformers_ext/operators/kvcache/kvcache_read_write.cpp
similarity index 100%
rename from csrc/ktransformers_ext/operators/kvcache/kvcache_read_write.cpp
rename to archive/csrc/ktransformers_ext/operators/kvcache/kvcache_read_write.cpp
diff --git a/csrc/ktransformers_ext/operators/kvcache/kvcache_utils.cpp b/archive/csrc/ktransformers_ext/operators/kvcache/kvcache_utils.cpp
similarity index 100%
rename from csrc/ktransformers_ext/operators/kvcache/kvcache_utils.cpp
rename to archive/csrc/ktransformers_ext/operators/kvcache/kvcache_utils.cpp
diff --git a/csrc/ktransformers_ext/operators/llamafile/conversion.h b/archive/csrc/ktransformers_ext/operators/llamafile/conversion.h
similarity index 100%
rename from csrc/ktransformers_ext/operators/llamafile/conversion.h
rename to archive/csrc/ktransformers_ext/operators/llamafile/conversion.h
diff --git a/csrc/ktransformers_ext/operators/llamafile/linear.cpp b/archive/csrc/ktransformers_ext/operators/llamafile/linear.cpp
similarity index 100%
rename from csrc/ktransformers_ext/operators/llamafile/linear.cpp
rename to archive/csrc/ktransformers_ext/operators/llamafile/linear.cpp
diff --git a/csrc/ktransformers_ext/operators/llamafile/linear.h b/archive/csrc/ktransformers_ext/operators/llamafile/linear.h
similarity index 100%
rename from csrc/ktransformers_ext/operators/llamafile/linear.h
rename to archive/csrc/ktransformers_ext/operators/llamafile/linear.h
diff --git a/csrc/ktransformers_ext/operators/llamafile/mlp.cpp b/archive/csrc/ktransformers_ext/operators/llamafile/mlp.cpp
similarity index 100%
rename from csrc/ktransformers_ext/operators/llamafile/mlp.cpp
rename to archive/csrc/ktransformers_ext/operators/llamafile/mlp.cpp
diff --git a/csrc/ktransformers_ext/operators/llamafile/mlp.h b/archive/csrc/ktransformers_ext/operators/llamafile/mlp.h
similarity index 100%
rename from csrc/ktransformers_ext/operators/llamafile/mlp.h
rename to archive/csrc/ktransformers_ext/operators/llamafile/mlp.h
diff --git a/csrc/ktransformers_ext/operators/llamafile/moe.cpp b/archive/csrc/ktransformers_ext/operators/llamafile/moe.cpp
similarity index 100%
rename from csrc/ktransformers_ext/operators/llamafile/moe.cpp
rename to archive/csrc/ktransformers_ext/operators/llamafile/moe.cpp
diff --git a/csrc/ktransformers_ext/operators/llamafile/moe.h b/archive/csrc/ktransformers_ext/operators/llamafile/moe.h
similarity index 100%
rename from csrc/ktransformers_ext/operators/llamafile/moe.h
rename to archive/csrc/ktransformers_ext/operators/llamafile/moe.h
diff --git a/csrc/ktransformers_ext/vendors/cuda.h b/archive/csrc/ktransformers_ext/vendors/cuda.h
similarity index 100%
rename from csrc/ktransformers_ext/vendors/cuda.h
rename to archive/csrc/ktransformers_ext/vendors/cuda.h
diff --git a/csrc/ktransformers_ext/vendors/hip.h b/archive/csrc/ktransformers_ext/vendors/hip.h
similarity index 100%
rename from csrc/ktransformers_ext/vendors/hip.h
rename to archive/csrc/ktransformers_ext/vendors/hip.h
diff --git a/csrc/ktransformers_ext/vendors/musa.h b/archive/csrc/ktransformers_ext/vendors/musa.h
similarity index 100%
rename from csrc/ktransformers_ext/vendors/musa.h
rename to archive/csrc/ktransformers_ext/vendors/musa.h
diff --git a/csrc/ktransformers_ext/vendors/vendor.h b/archive/csrc/ktransformers_ext/vendors/vendor.h
similarity index 100%
rename from csrc/ktransformers_ext/vendors/vendor.h
rename to archive/csrc/ktransformers_ext/vendors/vendor.h
diff --git a/install-with-cache.sh b/archive/install-with-cache.sh
similarity index 100%
rename from install-with-cache.sh
rename to archive/install-with-cache.sh
diff --git a/install.bat b/archive/install.bat
similarity index 100%
rename from install.bat
rename to archive/install.bat
diff --git a/install.sh b/archive/install.sh
similarity index 100%
rename from install.sh
rename to archive/install.sh
diff --git a/install_for_npu.sh b/archive/install_for_npu.sh
similarity index 100%
rename from install_for_npu.sh
rename to archive/install_for_npu.sh
diff --git a/ktransformers/__init__.py b/archive/ktransformers/__init__.py
similarity index 100%
rename from ktransformers/__init__.py
rename to archive/ktransformers/__init__.py
diff --git a/ktransformers/configs/config.yaml b/archive/ktransformers/configs/config.yaml
similarity index 100%
rename from ktransformers/configs/config.yaml
rename to archive/ktransformers/configs/config.yaml
diff --git a/ktransformers/configs/log_config.ini b/archive/ktransformers/configs/log_config.ini
similarity index 100%
rename from ktransformers/configs/log_config.ini
rename to archive/ktransformers/configs/log_config.ini
diff --git a/ktransformers/ktransformers b/archive/ktransformers/ktransformers
similarity index 100%
rename from ktransformers/ktransformers
rename to archive/ktransformers/ktransformers
diff --git a/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/__init__.py b/archive/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/__init__.py
similarity index 100%
rename from ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/__init__.py
rename to archive/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/__init__.py
diff --git a/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/format_24.py b/archive/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/format_24.py
similarity index 100%
rename from ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/format_24.py
rename to archive/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/format_24.py
diff --git a/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/marlin_24_perms.py b/archive/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/marlin_24_perms.py
similarity index 100%
rename from ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/marlin_24_perms.py
rename to archive/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/marlin_24_perms.py
diff --git a/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/marlin_perms.py b/archive/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/marlin_perms.py
similarity index 100%
rename from ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/marlin_perms.py
rename to archive/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/marlin_perms.py
diff --git a/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/marlin_utils.py b/archive/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/marlin_utils.py
similarity index 100%
rename from ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/marlin_utils.py
rename to archive/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/marlin_utils.py
diff --git a/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/quant_utils.py b/archive/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/quant_utils.py
similarity index 100%
rename from ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/quant_utils.py
rename to archive/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/quant_utils.py
diff --git a/ktransformers/ktransformers_ext/triton/fp8gemm.py b/archive/ktransformers/ktransformers_ext/triton/fp8gemm.py
similarity index 100%
rename from ktransformers/ktransformers_ext/triton/fp8gemm.py
rename to archive/ktransformers/ktransformers_ext/triton/fp8gemm.py
diff --git a/ktransformers/local_chat.py b/archive/ktransformers/local_chat.py
similarity index 100%
rename from ktransformers/local_chat.py
rename to archive/ktransformers/local_chat.py
diff --git a/ktransformers/local_chat_test.py b/archive/ktransformers/local_chat_test.py
similarity index 100%
rename from ktransformers/local_chat_test.py
rename to archive/ktransformers/local_chat_test.py
diff --git a/ktransformers/models/__init__.py b/archive/ktransformers/models/__init__.py
similarity index 100%
rename from ktransformers/models/__init__.py
rename to archive/ktransformers/models/__init__.py
diff --git a/ktransformers/models/ascend/custom_ascend_modeling_deepseek_v3.py b/archive/ktransformers/models/ascend/custom_ascend_modeling_deepseek_v3.py
similarity index 100%
rename from ktransformers/models/ascend/custom_ascend_modeling_deepseek_v3.py
rename to archive/ktransformers/models/ascend/custom_ascend_modeling_deepseek_v3.py
diff --git a/ktransformers/models/configuration_deepseek.py b/archive/ktransformers/models/configuration_deepseek.py
similarity index 100%
rename from ktransformers/models/configuration_deepseek.py
rename to archive/ktransformers/models/configuration_deepseek.py
diff --git a/ktransformers/models/configuration_deepseek_v3.py b/archive/ktransformers/models/configuration_deepseek_v3.py
similarity index 100%
rename from ktransformers/models/configuration_deepseek_v3.py
rename to archive/ktransformers/models/configuration_deepseek_v3.py
diff --git a/ktransformers/models/configuration_glm4_moe.py b/archive/ktransformers/models/configuration_glm4_moe.py
similarity index 100%
rename from ktransformers/models/configuration_glm4_moe.py
rename to archive/ktransformers/models/configuration_glm4_moe.py
diff --git a/ktransformers/models/configuration_llama.py b/archive/ktransformers/models/configuration_llama.py
similarity index 100%
rename from ktransformers/models/configuration_llama.py
rename to archive/ktransformers/models/configuration_llama.py
diff --git a/ktransformers/models/configuration_qwen2_moe.py b/archive/ktransformers/models/configuration_qwen2_moe.py
similarity index 100%
rename from ktransformers/models/configuration_qwen2_moe.py
rename to archive/ktransformers/models/configuration_qwen2_moe.py
diff --git a/ktransformers/models/configuration_qwen3_moe.py b/archive/ktransformers/models/configuration_qwen3_moe.py
similarity index 100%
rename from ktransformers/models/configuration_qwen3_moe.py
rename to archive/ktransformers/models/configuration_qwen3_moe.py
diff --git a/ktransformers/models/configuration_qwen3_next.py b/archive/ktransformers/models/configuration_qwen3_next.py
similarity index 100%
rename from ktransformers/models/configuration_qwen3_next.py
rename to archive/ktransformers/models/configuration_qwen3_next.py
diff --git a/ktransformers/models/configuration_smallthinker.py b/archive/ktransformers/models/configuration_smallthinker.py
similarity index 100%
rename from ktransformers/models/configuration_smallthinker.py
rename to archive/ktransformers/models/configuration_smallthinker.py
diff --git a/ktransformers/models/custom_cache.py b/archive/ktransformers/models/custom_cache.py
similarity index 100%
rename from ktransformers/models/custom_cache.py
rename to archive/ktransformers/models/custom_cache.py
diff --git a/ktransformers/models/custom_modeling_deepseek_v2.py b/archive/ktransformers/models/custom_modeling_deepseek_v2.py
similarity index 100%
rename from ktransformers/models/custom_modeling_deepseek_v2.py
rename to archive/ktransformers/models/custom_modeling_deepseek_v2.py
diff --git a/ktransformers/models/custom_modeling_deepseek_v3.py b/archive/ktransformers/models/custom_modeling_deepseek_v3.py
similarity index 100%
rename from ktransformers/models/custom_modeling_deepseek_v3.py
rename to archive/ktransformers/models/custom_modeling_deepseek_v3.py
diff --git a/ktransformers/models/custom_modeling_glm4_moe.py b/archive/ktransformers/models/custom_modeling_glm4_moe.py
similarity index 100%
rename from ktransformers/models/custom_modeling_glm4_moe.py
rename to archive/ktransformers/models/custom_modeling_glm4_moe.py
diff --git a/ktransformers/models/custom_modeling_qwen2_moe.py b/archive/ktransformers/models/custom_modeling_qwen2_moe.py
similarity index 100%
rename from ktransformers/models/custom_modeling_qwen2_moe.py
rename to archive/ktransformers/models/custom_modeling_qwen2_moe.py
diff --git a/ktransformers/models/custom_modeling_qwen3_moe.py b/archive/ktransformers/models/custom_modeling_qwen3_moe.py
similarity index 100%
rename from ktransformers/models/custom_modeling_qwen3_moe.py
rename to archive/ktransformers/models/custom_modeling_qwen3_moe.py
diff --git a/ktransformers/models/custom_modeling_qwen3_next.py b/archive/ktransformers/models/custom_modeling_qwen3_next.py
similarity index 100%
rename from ktransformers/models/custom_modeling_qwen3_next.py
rename to archive/ktransformers/models/custom_modeling_qwen3_next.py
diff --git a/ktransformers/models/custom_modeling_smallthinker.py b/archive/ktransformers/models/custom_modeling_smallthinker.py
similarity index 100%
rename from ktransformers/models/custom_modeling_smallthinker.py
rename to archive/ktransformers/models/custom_modeling_smallthinker.py
diff --git a/ktransformers/models/modeling_deepseek.py b/archive/ktransformers/models/modeling_deepseek.py
similarity index 100%
rename from ktransformers/models/modeling_deepseek.py
rename to archive/ktransformers/models/modeling_deepseek.py
diff --git a/ktransformers/models/modeling_deepseek_v3.py b/archive/ktransformers/models/modeling_deepseek_v3.py
similarity index 100%
rename from ktransformers/models/modeling_deepseek_v3.py
rename to archive/ktransformers/models/modeling_deepseek_v3.py
diff --git a/ktransformers/models/modeling_glm4_moe.py b/archive/ktransformers/models/modeling_glm4_moe.py
similarity index 100%
rename from ktransformers/models/modeling_glm4_moe.py
rename to archive/ktransformers/models/modeling_glm4_moe.py
diff --git a/ktransformers/models/modeling_llama.py b/archive/ktransformers/models/modeling_llama.py
similarity index 100%
rename from ktransformers/models/modeling_llama.py
rename to archive/ktransformers/models/modeling_llama.py
diff --git a/ktransformers/models/modeling_mixtral.py b/archive/ktransformers/models/modeling_mixtral.py
similarity index 100%
rename from ktransformers/models/modeling_mixtral.py
rename to archive/ktransformers/models/modeling_mixtral.py
diff --git a/ktransformers/models/modeling_qwen2_moe.py b/archive/ktransformers/models/modeling_qwen2_moe.py
similarity index 100%
rename from ktransformers/models/modeling_qwen2_moe.py
rename to archive/ktransformers/models/modeling_qwen2_moe.py
diff --git a/ktransformers/models/modeling_qwen3_moe.py b/archive/ktransformers/models/modeling_qwen3_moe.py
similarity index 100%
rename from ktransformers/models/modeling_qwen3_moe.py
rename to archive/ktransformers/models/modeling_qwen3_moe.py
diff --git a/ktransformers/models/modeling_qwen3_next.py b/archive/ktransformers/models/modeling_qwen3_next.py
similarity index 100%
rename from ktransformers/models/modeling_qwen3_next.py
rename to archive/ktransformers/models/modeling_qwen3_next.py
diff --git a/ktransformers/models/modeling_smallthinker.py b/archive/ktransformers/models/modeling_smallthinker.py
similarity index 100%
rename from ktransformers/models/modeling_smallthinker.py
rename to archive/ktransformers/models/modeling_smallthinker.py
diff --git a/ktransformers/operators/RoPE.py b/archive/ktransformers/operators/RoPE.py
similarity index 100%
rename from ktransformers/operators/RoPE.py
rename to archive/ktransformers/operators/RoPE.py
diff --git a/ktransformers/operators/__init__.py b/archive/ktransformers/operators/__init__.py
similarity index 100%
rename from ktransformers/operators/__init__.py
rename to archive/ktransformers/operators/__init__.py
diff --git a/ktransformers/operators/ascend/ascend_attention.py b/archive/ktransformers/operators/ascend/ascend_attention.py
similarity index 100%
rename from ktransformers/operators/ascend/ascend_attention.py
rename to archive/ktransformers/operators/ascend/ascend_attention.py
diff --git a/ktransformers/operators/ascend/ascend_experts.py b/archive/ktransformers/operators/ascend/ascend_experts.py
similarity index 100%
rename from ktransformers/operators/ascend/ascend_experts.py
rename to archive/ktransformers/operators/ascend/ascend_experts.py
diff --git a/ktransformers/operators/ascend/ascend_gate.py b/archive/ktransformers/operators/ascend/ascend_gate.py
similarity index 100%
rename from ktransformers/operators/ascend/ascend_gate.py
rename to archive/ktransformers/operators/ascend/ascend_gate.py
diff --git a/ktransformers/operators/ascend/ascend_layernorm.py b/archive/ktransformers/operators/ascend/ascend_layernorm.py
similarity index 100%
rename from ktransformers/operators/ascend/ascend_layernorm.py
rename to archive/ktransformers/operators/ascend/ascend_layernorm.py
diff --git a/ktransformers/operators/ascend/ascend_linear.py b/archive/ktransformers/operators/ascend/ascend_linear.py
similarity index 100%
rename from ktransformers/operators/ascend/ascend_linear.py
rename to archive/ktransformers/operators/ascend/ascend_linear.py
diff --git a/ktransformers/operators/ascend/ascend_mlp.py b/archive/ktransformers/operators/ascend/ascend_mlp.py
similarity index 100%
rename from ktransformers/operators/ascend/ascend_mlp.py
rename to archive/ktransformers/operators/ascend/ascend_mlp.py
diff --git a/ktransformers/operators/attention.py b/archive/ktransformers/operators/attention.py
similarity index 100%
rename from ktransformers/operators/attention.py
rename to archive/ktransformers/operators/attention.py
diff --git a/ktransformers/operators/balance_serve_attention.py b/archive/ktransformers/operators/balance_serve_attention.py
similarity index 100%
rename from ktransformers/operators/balance_serve_attention.py
rename to archive/ktransformers/operators/balance_serve_attention.py
diff --git a/ktransformers/operators/base_operator.py b/archive/ktransformers/operators/base_operator.py
similarity index 100%
rename from ktransformers/operators/base_operator.py
rename to archive/ktransformers/operators/base_operator.py
diff --git a/ktransformers/operators/cpuinfer.py b/archive/ktransformers/operators/cpuinfer.py
similarity index 100%
rename from ktransformers/operators/cpuinfer.py
rename to archive/ktransformers/operators/cpuinfer.py
diff --git a/ktransformers/operators/dynamic_attention.py b/archive/ktransformers/operators/dynamic_attention.py
similarity index 100%
rename from ktransformers/operators/dynamic_attention.py
rename to archive/ktransformers/operators/dynamic_attention.py
diff --git a/ktransformers/operators/experts.py b/archive/ktransformers/operators/experts.py
similarity index 100%
rename from ktransformers/operators/experts.py
rename to archive/ktransformers/operators/experts.py
diff --git a/ktransformers/operators/flashinfer_batch_prefill_wrapper.py b/archive/ktransformers/operators/flashinfer_batch_prefill_wrapper.py
similarity index 100%
rename from ktransformers/operators/flashinfer_batch_prefill_wrapper.py
rename to archive/ktransformers/operators/flashinfer_batch_prefill_wrapper.py
diff --git a/ktransformers/operators/flashinfer_wrapper.py b/archive/ktransformers/operators/flashinfer_wrapper.py
similarity index 100%
rename from ktransformers/operators/flashinfer_wrapper.py
rename to archive/ktransformers/operators/flashinfer_wrapper.py
diff --git a/ktransformers/operators/gate.py b/archive/ktransformers/operators/gate.py
similarity index 100%
rename from ktransformers/operators/gate.py
rename to archive/ktransformers/operators/gate.py
diff --git a/ktransformers/operators/layernorm.py b/archive/ktransformers/operators/layernorm.py
similarity index 100%
rename from ktransformers/operators/layernorm.py
rename to archive/ktransformers/operators/layernorm.py
diff --git a/ktransformers/operators/linear.py b/archive/ktransformers/operators/linear.py
similarity index 100%
rename from ktransformers/operators/linear.py
rename to archive/ktransformers/operators/linear.py
diff --git a/ktransformers/operators/mlp.py b/archive/ktransformers/operators/mlp.py
similarity index 100%
rename from ktransformers/operators/mlp.py
rename to archive/ktransformers/operators/mlp.py
diff --git a/ktransformers/operators/models.py b/archive/ktransformers/operators/models.py
similarity index 100%
rename from ktransformers/operators/models.py
rename to archive/ktransformers/operators/models.py
diff --git a/ktransformers/operators/triton_attention.py b/archive/ktransformers/operators/triton_attention.py
similarity index 100%
rename from ktransformers/operators/triton_attention.py
rename to archive/ktransformers/operators/triton_attention.py
diff --git a/ktransformers/operators/triton_attention_prefill.py b/archive/ktransformers/operators/triton_attention_prefill.py
similarity index 100%
rename from ktransformers/operators/triton_attention_prefill.py
rename to archive/ktransformers/operators/triton_attention_prefill.py
diff --git a/ktransformers/optimize/optimize.py b/archive/ktransformers/optimize/optimize.py
similarity index 100%
rename from ktransformers/optimize/optimize.py
rename to archive/ktransformers/optimize/optimize.py
diff --git a/ktransformers/optimize/optimize_rules/DeepSeek-V2-Chat-multi-gpu-4.yaml b/archive/ktransformers/optimize/optimize_rules/DeepSeek-V2-Chat-multi-gpu-4.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/DeepSeek-V2-Chat-multi-gpu-4.yaml
rename to archive/ktransformers/optimize/optimize_rules/DeepSeek-V2-Chat-multi-gpu-4.yaml
diff --git a/ktransformers/optimize/optimize_rules/DeepSeek-V2-Chat-multi-gpu.yaml b/archive/ktransformers/optimize/optimize_rules/DeepSeek-V2-Chat-multi-gpu.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/DeepSeek-V2-Chat-multi-gpu.yaml
rename to archive/ktransformers/optimize/optimize_rules/DeepSeek-V2-Chat-multi-gpu.yaml
diff --git a/ktransformers/optimize/optimize_rules/DeepSeek-V2-Chat.yaml b/archive/ktransformers/optimize/optimize_rules/DeepSeek-V2-Chat.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/DeepSeek-V2-Chat.yaml
rename to archive/ktransformers/optimize/optimize_rules/DeepSeek-V2-Chat.yaml
diff --git a/ktransformers/optimize/optimize_rules/DeepSeek-V2-Lite-Chat-gpu-cpu.yaml b/archive/ktransformers/optimize/optimize_rules/DeepSeek-V2-Lite-Chat-gpu-cpu.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/DeepSeek-V2-Lite-Chat-gpu-cpu.yaml
rename to archive/ktransformers/optimize/optimize_rules/DeepSeek-V2-Lite-Chat-gpu-cpu.yaml
diff --git a/ktransformers/optimize/optimize_rules/DeepSeek-V2-Lite-Chat-multi-gpu.yaml b/archive/ktransformers/optimize/optimize_rules/DeepSeek-V2-Lite-Chat-multi-gpu.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/DeepSeek-V2-Lite-Chat-multi-gpu.yaml
rename to archive/ktransformers/optimize/optimize_rules/DeepSeek-V2-Lite-Chat-multi-gpu.yaml
diff --git a/ktransformers/optimize/optimize_rules/DeepSeek-V2-Lite-Chat.yaml b/archive/ktransformers/optimize/optimize_rules/DeepSeek-V2-Lite-Chat.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/DeepSeek-V2-Lite-Chat.yaml
rename to archive/ktransformers/optimize/optimize_rules/DeepSeek-V2-Lite-Chat.yaml
diff --git a/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-amx.yaml b/archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-amx.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-amx.yaml
rename to archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-amx.yaml
diff --git a/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-fp8-linear-ggml-experts-serve-amx.yaml b/archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-fp8-linear-ggml-experts-serve-amx.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-fp8-linear-ggml-experts-serve-amx.yaml
rename to archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-fp8-linear-ggml-experts-serve-amx.yaml
diff --git a/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-fp8-linear-ggml-experts-serve.yaml b/archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-fp8-linear-ggml-experts-serve.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-fp8-linear-ggml-experts-serve.yaml
rename to archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-fp8-linear-ggml-experts-serve.yaml
diff --git a/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-fp8-linear-ggml-experts.yaml b/archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-fp8-linear-ggml-experts.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-fp8-linear-ggml-experts.yaml
rename to archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-fp8-linear-ggml-experts.yaml
diff --git a/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-4.yaml b/archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-4.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-4.yaml
rename to archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-4.yaml
diff --git a/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-8.yaml b/archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-8.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-8.yaml
rename to archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-8.yaml
diff --git a/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-fp8-linear-ggml-experts.yaml b/archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-fp8-linear-ggml-experts.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-fp8-linear-ggml-experts.yaml
rename to archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-fp8-linear-ggml-experts.yaml
diff --git a/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-marlin.yaml b/archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-marlin.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-marlin.yaml
rename to archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-marlin.yaml
diff --git a/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu.yaml b/archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu.yaml
rename to archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu.yaml
diff --git a/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-npu.yaml b/archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-npu.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-npu.yaml
rename to archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-npu.yaml
diff --git a/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-serve.yaml b/archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-serve.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-serve.yaml
rename to archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-serve.yaml
diff --git a/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat.yaml b/archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat.yaml
rename to archive/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat.yaml
diff --git a/ktransformers/optimize/optimize_rules/Glm4Moe-serve.yaml b/archive/ktransformers/optimize/optimize_rules/Glm4Moe-serve.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/Glm4Moe-serve.yaml
rename to archive/ktransformers/optimize/optimize_rules/Glm4Moe-serve.yaml
diff --git a/ktransformers/optimize/optimize_rules/Internlm2_5-7b-Chat-1m.yaml b/archive/ktransformers/optimize/optimize_rules/Internlm2_5-7b-Chat-1m.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/Internlm2_5-7b-Chat-1m.yaml
rename to archive/ktransformers/optimize/optimize_rules/Internlm2_5-7b-Chat-1m.yaml
diff --git a/ktransformers/optimize/optimize_rules/Mixtral.yaml b/archive/ktransformers/optimize/optimize_rules/Mixtral.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/Mixtral.yaml
rename to archive/ktransformers/optimize/optimize_rules/Mixtral.yaml
diff --git a/ktransformers/optimize/optimize_rules/Moonlight-16B-A3B-serve.yaml b/archive/ktransformers/optimize/optimize_rules/Moonlight-16B-A3B-serve.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/Moonlight-16B-A3B-serve.yaml
rename to archive/ktransformers/optimize/optimize_rules/Moonlight-16B-A3B-serve.yaml
diff --git a/ktransformers/optimize/optimize_rules/Moonlight-16B-A3B.yaml b/archive/ktransformers/optimize/optimize_rules/Moonlight-16B-A3B.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/Moonlight-16B-A3B.yaml
rename to archive/ktransformers/optimize/optimize_rules/Moonlight-16B-A3B.yaml
diff --git a/ktransformers/optimize/optimize_rules/Qwen2-57B-A14B-Instruct-multi-gpu.yaml b/archive/ktransformers/optimize/optimize_rules/Qwen2-57B-A14B-Instruct-multi-gpu.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/Qwen2-57B-A14B-Instruct-multi-gpu.yaml
rename to archive/ktransformers/optimize/optimize_rules/Qwen2-57B-A14B-Instruct-multi-gpu.yaml
diff --git a/ktransformers/optimize/optimize_rules/Qwen2-57B-A14B-Instruct.yaml b/archive/ktransformers/optimize/optimize_rules/Qwen2-57B-A14B-Instruct.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/Qwen2-57B-A14B-Instruct.yaml
rename to archive/ktransformers/optimize/optimize_rules/Qwen2-57B-A14B-Instruct.yaml
diff --git a/ktransformers/optimize/optimize_rules/Qwen2-serve-amx.yaml b/archive/ktransformers/optimize/optimize_rules/Qwen2-serve-amx.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/Qwen2-serve-amx.yaml
rename to archive/ktransformers/optimize/optimize_rules/Qwen2-serve-amx.yaml
diff --git a/ktransformers/optimize/optimize_rules/Qwen2-serve.yaml b/archive/ktransformers/optimize/optimize_rules/Qwen2-serve.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/Qwen2-serve.yaml
rename to archive/ktransformers/optimize/optimize_rules/Qwen2-serve.yaml
diff --git a/ktransformers/optimize/optimize_rules/Qwen3Moe-serve-amx.yaml b/archive/ktransformers/optimize/optimize_rules/Qwen3Moe-serve-amx.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/Qwen3Moe-serve-amx.yaml
rename to archive/ktransformers/optimize/optimize_rules/Qwen3Moe-serve-amx.yaml
diff --git a/ktransformers/optimize/optimize_rules/Qwen3Moe-serve.yaml b/archive/ktransformers/optimize/optimize_rules/Qwen3Moe-serve.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/Qwen3Moe-serve.yaml
rename to archive/ktransformers/optimize/optimize_rules/Qwen3Moe-serve.yaml
diff --git a/ktransformers/optimize/optimize_rules/Qwen3Next-serve.yaml b/archive/ktransformers/optimize/optimize_rules/Qwen3Next-serve.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/Qwen3Next-serve.yaml
rename to archive/ktransformers/optimize/optimize_rules/Qwen3Next-serve.yaml
diff --git a/ktransformers/optimize/optimize_rules/Smallthinker-serve.yaml b/archive/ktransformers/optimize/optimize_rules/Smallthinker-serve.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/Smallthinker-serve.yaml
rename to archive/ktransformers/optimize/optimize_rules/Smallthinker-serve.yaml
diff --git a/ktransformers/optimize/optimize_rules/npu/DeepSeek-V3-Chat-300IA2-npu-serve.yaml b/archive/ktransformers/optimize/optimize_rules/npu/DeepSeek-V3-Chat-300IA2-npu-serve.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/npu/DeepSeek-V3-Chat-300IA2-npu-serve.yaml
rename to archive/ktransformers/optimize/optimize_rules/npu/DeepSeek-V3-Chat-300IA2-npu-serve.yaml
diff --git a/ktransformers/optimize/optimize_rules/npu/DeepSeek-V3-Chat-300IA2-npu.yaml b/archive/ktransformers/optimize/optimize_rules/npu/DeepSeek-V3-Chat-300IA2-npu.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/npu/DeepSeek-V3-Chat-300IA2-npu.yaml
rename to archive/ktransformers/optimize/optimize_rules/npu/DeepSeek-V3-Chat-300IA2-npu.yaml
diff --git a/ktransformers/optimize/optimize_rules/rocm/DeepSeek-V3-Chat.yaml b/archive/ktransformers/optimize/optimize_rules/rocm/DeepSeek-V3-Chat.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/rocm/DeepSeek-V3-Chat.yaml
rename to archive/ktransformers/optimize/optimize_rules/rocm/DeepSeek-V3-Chat.yaml
diff --git a/ktransformers/optimize/optimize_rules/xpu/DeepSeek-V2-Chat.yaml b/archive/ktransformers/optimize/optimize_rules/xpu/DeepSeek-V2-Chat.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/xpu/DeepSeek-V2-Chat.yaml
rename to archive/ktransformers/optimize/optimize_rules/xpu/DeepSeek-V2-Chat.yaml
diff --git a/ktransformers/optimize/optimize_rules/xpu/DeepSeek-V3-Chat.yaml b/archive/ktransformers/optimize/optimize_rules/xpu/DeepSeek-V3-Chat.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/xpu/DeepSeek-V3-Chat.yaml
rename to archive/ktransformers/optimize/optimize_rules/xpu/DeepSeek-V3-Chat.yaml
diff --git a/ktransformers/optimize/optimize_rules/xpu/Qwen3Moe-Chat.yaml b/archive/ktransformers/optimize/optimize_rules/xpu/Qwen3Moe-Chat.yaml
similarity index 100%
rename from ktransformers/optimize/optimize_rules/xpu/Qwen3Moe-Chat.yaml
rename to archive/ktransformers/optimize/optimize_rules/xpu/Qwen3Moe-Chat.yaml
diff --git a/ktransformers/server/__init__.py b/archive/ktransformers/server/__init__.py
similarity index 100%
rename from ktransformers/server/__init__.py
rename to archive/ktransformers/server/__init__.py
diff --git a/ktransformers/server/api/__init__.py b/archive/ktransformers/server/api/__init__.py
similarity index 100%
rename from ktransformers/server/api/__init__.py
rename to archive/ktransformers/server/api/__init__.py
diff --git a/ktransformers/server/api/ollama/__init__.py b/archive/ktransformers/server/api/ollama/__init__.py
similarity index 100%
rename from ktransformers/server/api/ollama/__init__.py
rename to archive/ktransformers/server/api/ollama/__init__.py
diff --git a/ktransformers/server/api/ollama/completions.py b/archive/ktransformers/server/api/ollama/completions.py
similarity index 100%
rename from ktransformers/server/api/ollama/completions.py
rename to archive/ktransformers/server/api/ollama/completions.py
diff --git a/ktransformers/server/api/openai/__init__.py b/archive/ktransformers/server/api/openai/__init__.py
similarity index 100%
rename from ktransformers/server/api/openai/__init__.py
rename to archive/ktransformers/server/api/openai/__init__.py
diff --git a/ktransformers/server/api/openai/assistants/__init__.py b/archive/ktransformers/server/api/openai/assistants/__init__.py
similarity index 100%
rename from ktransformers/server/api/openai/assistants/__init__.py
rename to archive/ktransformers/server/api/openai/assistants/__init__.py
diff --git a/ktransformers/server/api/openai/assistants/assistants.py b/archive/ktransformers/server/api/openai/assistants/assistants.py
similarity index 100%
rename from ktransformers/server/api/openai/assistants/assistants.py
rename to archive/ktransformers/server/api/openai/assistants/assistants.py
diff --git a/ktransformers/server/api/openai/assistants/messages.py b/archive/ktransformers/server/api/openai/assistants/messages.py
similarity index 100%
rename from ktransformers/server/api/openai/assistants/messages.py
rename to archive/ktransformers/server/api/openai/assistants/messages.py
diff --git a/ktransformers/server/api/openai/assistants/runs.py b/archive/ktransformers/server/api/openai/assistants/runs.py
similarity index 100%
rename from ktransformers/server/api/openai/assistants/runs.py
rename to archive/ktransformers/server/api/openai/assistants/runs.py
diff --git a/ktransformers/server/api/openai/assistants/threads.py b/archive/ktransformers/server/api/openai/assistants/threads.py
similarity index 100%
rename from ktransformers/server/api/openai/assistants/threads.py
rename to archive/ktransformers/server/api/openai/assistants/threads.py
diff --git a/ktransformers/server/api/openai/endpoints/__init__.py b/archive/ktransformers/server/api/openai/endpoints/__init__.py
similarity index 100%
rename from ktransformers/server/api/openai/endpoints/__init__.py
rename to archive/ktransformers/server/api/openai/endpoints/__init__.py
diff --git a/ktransformers/server/api/openai/endpoints/chat.py b/archive/ktransformers/server/api/openai/endpoints/chat.py
similarity index 100%
rename from ktransformers/server/api/openai/endpoints/chat.py
rename to archive/ktransformers/server/api/openai/endpoints/chat.py
diff --git a/ktransformers/server/api/openai/legacy/__init__.py b/archive/ktransformers/server/api/openai/legacy/__init__.py
similarity index 100%
rename from ktransformers/server/api/openai/legacy/__init__.py
rename to archive/ktransformers/server/api/openai/legacy/__init__.py
diff --git a/ktransformers/server/api/openai/legacy/completions.py b/archive/ktransformers/server/api/openai/legacy/completions.py
similarity index 100%
rename from ktransformers/server/api/openai/legacy/completions.py
rename to archive/ktransformers/server/api/openai/legacy/completions.py
diff --git a/ktransformers/server/api/web/__init__.py b/archive/ktransformers/server/api/web/__init__.py
similarity index 100%
rename from ktransformers/server/api/web/__init__.py
rename to archive/ktransformers/server/api/web/__init__.py
diff --git a/ktransformers/server/api/web/system.py b/archive/ktransformers/server/api/web/system.py
similarity index 100%
rename from ktransformers/server/api/web/system.py
rename to archive/ktransformers/server/api/web/system.py
diff --git a/ktransformers/server/args.py b/archive/ktransformers/server/args.py
similarity index 100%
rename from ktransformers/server/args.py
rename to archive/ktransformers/server/args.py
diff --git a/ktransformers/server/backend/__init__.py b/archive/ktransformers/server/backend/__init__.py
similarity index 100%
rename from ktransformers/server/backend/__init__.py
rename to archive/ktransformers/server/backend/__init__.py
diff --git a/ktransformers/server/backend/args.py b/archive/ktransformers/server/backend/args.py
similarity index 100%
rename from ktransformers/server/backend/args.py
rename to archive/ktransformers/server/backend/args.py
diff --git a/ktransformers/server/backend/base.py b/archive/ktransformers/server/backend/base.py
similarity index 100%
rename from ktransformers/server/backend/base.py
rename to archive/ktransformers/server/backend/base.py
diff --git a/ktransformers/server/backend/context_manager.py b/archive/ktransformers/server/backend/context_manager.py
similarity index 100%
rename from ktransformers/server/backend/context_manager.py
rename to archive/ktransformers/server/backend/context_manager.py
diff --git a/ktransformers/server/backend/interfaces/__init__.py b/archive/ktransformers/server/backend/interfaces/__init__.py
similarity index 100%
rename from ktransformers/server/backend/interfaces/__init__.py
rename to archive/ktransformers/server/backend/interfaces/__init__.py
diff --git a/ktransformers/server/backend/interfaces/balance_serve.py b/archive/ktransformers/server/backend/interfaces/balance_serve.py
similarity index 100%
rename from ktransformers/server/backend/interfaces/balance_serve.py
rename to archive/ktransformers/server/backend/interfaces/balance_serve.py
diff --git a/ktransformers/server/backend/interfaces/exllamav2.py b/archive/ktransformers/server/backend/interfaces/exllamav2.py
similarity index 100%
rename from ktransformers/server/backend/interfaces/exllamav2.py
rename to archive/ktransformers/server/backend/interfaces/exllamav2.py
diff --git a/ktransformers/server/backend/interfaces/ktransformers.py b/archive/ktransformers/server/backend/interfaces/ktransformers.py
similarity index 100%
rename from ktransformers/server/backend/interfaces/ktransformers.py
rename to archive/ktransformers/server/backend/interfaces/ktransformers.py
diff --git a/ktransformers/server/backend/interfaces/transformers.py b/archive/ktransformers/server/backend/interfaces/transformers.py
similarity index 100%
rename from ktransformers/server/backend/interfaces/transformers.py
rename to archive/ktransformers/server/backend/interfaces/transformers.py
diff --git a/ktransformers/server/balance_serve/inference/__init__.py b/archive/ktransformers/server/balance_serve/inference/__init__.py
similarity index 100%
rename from ktransformers/server/balance_serve/inference/__init__.py
rename to archive/ktransformers/server/balance_serve/inference/__init__.py
diff --git a/ktransformers/server/balance_serve/inference/config.py b/archive/ktransformers/server/balance_serve/inference/config.py
similarity index 100%
rename from ktransformers/server/balance_serve/inference/config.py
rename to archive/ktransformers/server/balance_serve/inference/config.py
diff --git a/ktransformers/server/balance_serve/inference/distributed/__init__.py b/archive/ktransformers/server/balance_serve/inference/distributed/__init__.py
similarity index 100%
rename from ktransformers/server/balance_serve/inference/distributed/__init__.py
rename to archive/ktransformers/server/balance_serve/inference/distributed/__init__.py
diff --git a/ktransformers/server/balance_serve/inference/distributed/communication_op.py b/archive/ktransformers/server/balance_serve/inference/distributed/communication_op.py
similarity index 100%
rename from ktransformers/server/balance_serve/inference/distributed/communication_op.py
rename to archive/ktransformers/server/balance_serve/inference/distributed/communication_op.py
diff --git a/ktransformers/server/balance_serve/inference/distributed/cuda_wrapper.py b/archive/ktransformers/server/balance_serve/inference/distributed/cuda_wrapper.py
similarity index 100%
rename from ktransformers/server/balance_serve/inference/distributed/cuda_wrapper.py
rename to archive/ktransformers/server/balance_serve/inference/distributed/cuda_wrapper.py
diff --git a/ktransformers/server/balance_serve/inference/distributed/custom_all_reduce.py b/archive/ktransformers/server/balance_serve/inference/distributed/custom_all_reduce.py
similarity index 100%
rename from ktransformers/server/balance_serve/inference/distributed/custom_all_reduce.py
rename to archive/ktransformers/server/balance_serve/inference/distributed/custom_all_reduce.py
diff --git a/ktransformers/server/balance_serve/inference/distributed/custom_all_reduce_utils.py b/archive/ktransformers/server/balance_serve/inference/distributed/custom_all_reduce_utils.py
similarity index 100%
rename from ktransformers/server/balance_serve/inference/distributed/custom_all_reduce_utils.py
rename to archive/ktransformers/server/balance_serve/inference/distributed/custom_all_reduce_utils.py
diff --git a/ktransformers/server/balance_serve/inference/distributed/parallel_state.py b/archive/ktransformers/server/balance_serve/inference/distributed/parallel_state.py
similarity index 100%
rename from ktransformers/server/balance_serve/inference/distributed/parallel_state.py
rename to archive/ktransformers/server/balance_serve/inference/distributed/parallel_state.py
diff --git a/ktransformers/server/balance_serve/inference/distributed/pynccl.py b/archive/ktransformers/server/balance_serve/inference/distributed/pynccl.py
similarity index 100%
rename from ktransformers/server/balance_serve/inference/distributed/pynccl.py
rename to archive/ktransformers/server/balance_serve/inference/distributed/pynccl.py
diff --git a/ktransformers/server/balance_serve/inference/distributed/pynccl_wrapper.py b/archive/ktransformers/server/balance_serve/inference/distributed/pynccl_wrapper.py
similarity index 100%
rename from ktransformers/server/balance_serve/inference/distributed/pynccl_wrapper.py
rename to archive/ktransformers/server/balance_serve/inference/distributed/pynccl_wrapper.py
diff --git a/ktransformers/server/balance_serve/inference/distributed/utils.py b/archive/ktransformers/server/balance_serve/inference/distributed/utils.py
similarity index 100%
rename from ktransformers/server/balance_serve/inference/distributed/utils.py
rename to archive/ktransformers/server/balance_serve/inference/distributed/utils.py
diff --git a/ktransformers/server/balance_serve/inference/forward_batch.py b/archive/ktransformers/server/balance_serve/inference/forward_batch.py
similarity index 100%
rename from ktransformers/server/balance_serve/inference/forward_batch.py
rename to archive/ktransformers/server/balance_serve/inference/forward_batch.py
diff --git a/ktransformers/server/balance_serve/inference/model_runner.py b/archive/ktransformers/server/balance_serve/inference/model_runner.py
similarity index 100%
rename from ktransformers/server/balance_serve/inference/model_runner.py
rename to archive/ktransformers/server/balance_serve/inference/model_runner.py
diff --git a/ktransformers/server/balance_serve/inference/query_manager.py b/archive/ktransformers/server/balance_serve/inference/query_manager.py
similarity index 100%
rename from ktransformers/server/balance_serve/inference/query_manager.py
rename to archive/ktransformers/server/balance_serve/inference/query_manager.py
diff --git a/ktransformers/server/balance_serve/inference/sampling/penaltylib/__init__.py b/archive/ktransformers/server/balance_serve/inference/sampling/penaltylib/__init__.py
similarity index 100%
rename from ktransformers/server/balance_serve/inference/sampling/penaltylib/__init__.py
rename to archive/ktransformers/server/balance_serve/inference/sampling/penaltylib/__init__.py
diff --git a/ktransformers/server/balance_serve/inference/sampling/penaltylib/orchestrator.py b/archive/ktransformers/server/balance_serve/inference/sampling/penaltylib/orchestrator.py
similarity index 100%
rename from ktransformers/server/balance_serve/inference/sampling/penaltylib/orchestrator.py
rename to archive/ktransformers/server/balance_serve/inference/sampling/penaltylib/orchestrator.py
diff --git a/ktransformers/server/balance_serve/inference/sampling/penaltylib/penalizers/frequency_penalty.py b/archive/ktransformers/server/balance_serve/inference/sampling/penaltylib/penalizers/frequency_penalty.py
similarity index 100%
rename from ktransformers/server/balance_serve/inference/sampling/penaltylib/penalizers/frequency_penalty.py
rename to archive/ktransformers/server/balance_serve/inference/sampling/penaltylib/penalizers/frequency_penalty.py
diff --git a/ktransformers/server/balance_serve/inference/sampling/penaltylib/penalizers/min_new_tokens.py b/archive/ktransformers/server/balance_serve/inference/sampling/penaltylib/penalizers/min_new_tokens.py
similarity index 100%
rename from ktransformers/server/balance_serve/inference/sampling/penaltylib/penalizers/min_new_tokens.py
rename to archive/ktransformers/server/balance_serve/inference/sampling/penaltylib/penalizers/min_new_tokens.py
diff --git a/ktransformers/server/balance_serve/inference/sampling/penaltylib/penalizers/presence_penalty.py b/archive/ktransformers/server/balance_serve/inference/sampling/penaltylib/penalizers/presence_penalty.py
similarity index 100%
rename from ktransformers/server/balance_serve/inference/sampling/penaltylib/penalizers/presence_penalty.py
rename to archive/ktransformers/server/balance_serve/inference/sampling/penaltylib/penalizers/presence_penalty.py
diff --git a/ktransformers/server/balance_serve/inference/sampling/penaltylib/penalizers/repetition_penalty.py b/archive/ktransformers/server/balance_serve/inference/sampling/penaltylib/penalizers/repetition_penalty.py
similarity index 100%
rename from ktransformers/server/balance_serve/inference/sampling/penaltylib/penalizers/repetition_penalty.py
rename to archive/ktransformers/server/balance_serve/inference/sampling/penaltylib/penalizers/repetition_penalty.py
diff --git a/ktransformers/server/balance_serve/inference/sampling/sampler.py b/archive/ktransformers/server/balance_serve/inference/sampling/sampler.py
similarity index 100%
rename from ktransformers/server/balance_serve/inference/sampling/sampler.py
rename to archive/ktransformers/server/balance_serve/inference/sampling/sampler.py
diff --git a/ktransformers/server/balance_serve/sched_rpc.py b/archive/ktransformers/server/balance_serve/sched_rpc.py
similarity index 100%
rename from ktransformers/server/balance_serve/sched_rpc.py
rename to archive/ktransformers/server/balance_serve/sched_rpc.py
diff --git a/ktransformers/server/balance_serve/settings.py b/archive/ktransformers/server/balance_serve/settings.py
similarity index 100%
rename from ktransformers/server/balance_serve/settings.py
rename to archive/ktransformers/server/balance_serve/settings.py
diff --git a/ktransformers/server/config/config.py b/archive/ktransformers/server/config/config.py
similarity index 100%
rename from ktransformers/server/config/config.py
rename to archive/ktransformers/server/config/config.py
diff --git a/ktransformers/server/config/log.py b/archive/ktransformers/server/config/log.py
similarity index 100%
rename from ktransformers/server/config/log.py
rename to archive/ktransformers/server/config/log.py
diff --git a/ktransformers/server/config/singleton.py b/archive/ktransformers/server/config/singleton.py
similarity index 100%
rename from ktransformers/server/config/singleton.py
rename to archive/ktransformers/server/config/singleton.py
diff --git a/ktransformers/server/crud/__init__.py b/archive/ktransformers/server/crud/__init__.py
similarity index 100%
rename from ktransformers/server/crud/__init__.py
rename to archive/ktransformers/server/crud/__init__.py
diff --git a/ktransformers/server/crud/assistants/__init__.py b/archive/ktransformers/server/crud/assistants/__init__.py
similarity index 100%
rename from ktransformers/server/crud/assistants/__init__.py
rename to archive/ktransformers/server/crud/assistants/__init__.py
diff --git a/ktransformers/server/crud/assistants/assistants.py b/archive/ktransformers/server/crud/assistants/assistants.py
similarity index 100%
rename from ktransformers/server/crud/assistants/assistants.py
rename to archive/ktransformers/server/crud/assistants/assistants.py
diff --git a/ktransformers/server/crud/assistants/messages.py b/archive/ktransformers/server/crud/assistants/messages.py
similarity index 100%
rename from ktransformers/server/crud/assistants/messages.py
rename to archive/ktransformers/server/crud/assistants/messages.py
diff --git a/ktransformers/server/crud/assistants/runs.py b/archive/ktransformers/server/crud/assistants/runs.py
similarity index 100%
rename from ktransformers/server/crud/assistants/runs.py
rename to archive/ktransformers/server/crud/assistants/runs.py
diff --git a/ktransformers/server/crud/assistants/threads.py b/archive/ktransformers/server/crud/assistants/threads.py
similarity index 100%
rename from ktransformers/server/crud/assistants/threads.py
rename to archive/ktransformers/server/crud/assistants/threads.py
diff --git a/ktransformers/server/exceptions.py b/archive/ktransformers/server/exceptions.py
similarity index 100%
rename from ktransformers/server/exceptions.py
rename to archive/ktransformers/server/exceptions.py
diff --git a/ktransformers/server/main.py b/archive/ktransformers/server/main.py
similarity index 100%
rename from ktransformers/server/main.py
rename to archive/ktransformers/server/main.py
diff --git a/ktransformers/server/models/__init__.py b/archive/ktransformers/server/models/__init__.py
similarity index 100%
rename from ktransformers/server/models/__init__.py
rename to archive/ktransformers/server/models/__init__.py
diff --git a/ktransformers/server/models/assistants/__init__.py b/archive/ktransformers/server/models/assistants/__init__.py
similarity index 100%
rename from ktransformers/server/models/assistants/__init__.py
rename to archive/ktransformers/server/models/assistants/__init__.py
diff --git a/ktransformers/server/models/assistants/assistants.py b/archive/ktransformers/server/models/assistants/assistants.py
similarity index 100%
rename from ktransformers/server/models/assistants/assistants.py
rename to archive/ktransformers/server/models/assistants/assistants.py
diff --git a/ktransformers/server/models/assistants/messages.py b/archive/ktransformers/server/models/assistants/messages.py
similarity index 100%
rename from ktransformers/server/models/assistants/messages.py
rename to archive/ktransformers/server/models/assistants/messages.py
diff --git a/ktransformers/server/models/assistants/run_steps.py b/archive/ktransformers/server/models/assistants/run_steps.py
similarity index 100%
rename from ktransformers/server/models/assistants/run_steps.py
rename to archive/ktransformers/server/models/assistants/run_steps.py
diff --git a/ktransformers/server/models/assistants/runs.py b/archive/ktransformers/server/models/assistants/runs.py
similarity index 100%
rename from ktransformers/server/models/assistants/runs.py
rename to archive/ktransformers/server/models/assistants/runs.py
diff --git a/ktransformers/server/models/assistants/threads.py b/archive/ktransformers/server/models/assistants/threads.py
similarity index 100%
rename from ktransformers/server/models/assistants/threads.py
rename to archive/ktransformers/server/models/assistants/threads.py
diff --git a/ktransformers/server/requirements.txt b/archive/ktransformers/server/requirements.txt
similarity index 100%
rename from ktransformers/server/requirements.txt
rename to archive/ktransformers/server/requirements.txt
diff --git a/ktransformers/server/schemas/__init__.py b/archive/ktransformers/server/schemas/__init__.py
similarity index 100%
rename from ktransformers/server/schemas/__init__.py
rename to archive/ktransformers/server/schemas/__init__.py
diff --git a/ktransformers/server/schemas/assistants/__init__.py b/archive/ktransformers/server/schemas/assistants/__init__.py
similarity index 100%
rename from ktransformers/server/schemas/assistants/__init__.py
rename to archive/ktransformers/server/schemas/assistants/__init__.py
diff --git a/ktransformers/server/schemas/assistants/assistants.py b/archive/ktransformers/server/schemas/assistants/assistants.py
similarity index 100%
rename from ktransformers/server/schemas/assistants/assistants.py
rename to archive/ktransformers/server/schemas/assistants/assistants.py
diff --git a/ktransformers/server/schemas/assistants/messages.py b/archive/ktransformers/server/schemas/assistants/messages.py
similarity index 100%
rename from ktransformers/server/schemas/assistants/messages.py
rename to archive/ktransformers/server/schemas/assistants/messages.py
diff --git a/ktransformers/server/schemas/assistants/runs.py b/archive/ktransformers/server/schemas/assistants/runs.py
similarity index 100%
rename from ktransformers/server/schemas/assistants/runs.py
rename to archive/ktransformers/server/schemas/assistants/runs.py
diff --git a/ktransformers/server/schemas/assistants/streaming.py b/archive/ktransformers/server/schemas/assistants/streaming.py
similarity index 100%
rename from ktransformers/server/schemas/assistants/streaming.py
rename to archive/ktransformers/server/schemas/assistants/streaming.py
diff --git a/ktransformers/server/schemas/assistants/threads.py b/archive/ktransformers/server/schemas/assistants/threads.py
similarity index 100%
rename from ktransformers/server/schemas/assistants/threads.py
rename to archive/ktransformers/server/schemas/assistants/threads.py
diff --git a/ktransformers/server/schemas/assistants/tool.py b/archive/ktransformers/server/schemas/assistants/tool.py
similarity index 100%
rename from ktransformers/server/schemas/assistants/tool.py
rename to archive/ktransformers/server/schemas/assistants/tool.py
diff --git a/ktransformers/server/schemas/base.py b/archive/ktransformers/server/schemas/base.py
similarity index 100%
rename from ktransformers/server/schemas/base.py
rename to archive/ktransformers/server/schemas/base.py
diff --git a/ktransformers/server/schemas/conversation.py b/archive/ktransformers/server/schemas/conversation.py
similarity index 100%
rename from ktransformers/server/schemas/conversation.py
rename to archive/ktransformers/server/schemas/conversation.py
diff --git a/ktransformers/server/schemas/endpoints/chat.py b/archive/ktransformers/server/schemas/endpoints/chat.py
similarity index 100%
rename from ktransformers/server/schemas/endpoints/chat.py
rename to archive/ktransformers/server/schemas/endpoints/chat.py
diff --git a/ktransformers/server/schemas/legacy/__init__.py b/archive/ktransformers/server/schemas/legacy/__init__.py
similarity index 100%
rename from ktransformers/server/schemas/legacy/__init__.py
rename to archive/ktransformers/server/schemas/legacy/__init__.py
diff --git a/ktransformers/server/schemas/legacy/completions.py b/archive/ktransformers/server/schemas/legacy/completions.py
similarity index 100%
rename from ktransformers/server/schemas/legacy/completions.py
rename to archive/ktransformers/server/schemas/legacy/completions.py
diff --git a/ktransformers/server/utils/__init__.py b/archive/ktransformers/server/utils/__init__.py
similarity index 100%
rename from ktransformers/server/utils/__init__.py
rename to archive/ktransformers/server/utils/__init__.py
diff --git a/ktransformers/server/utils/create_interface.py b/archive/ktransformers/server/utils/create_interface.py
similarity index 100%
rename from ktransformers/server/utils/create_interface.py
rename to archive/ktransformers/server/utils/create_interface.py
diff --git a/ktransformers/server/utils/multi_timer.py b/archive/ktransformers/server/utils/multi_timer.py
similarity index 100%
rename from ktransformers/server/utils/multi_timer.py
rename to archive/ktransformers/server/utils/multi_timer.py
diff --git a/ktransformers/server/utils/serve_profiling.py b/archive/ktransformers/server/utils/serve_profiling.py
similarity index 100%
rename from ktransformers/server/utils/serve_profiling.py
rename to archive/ktransformers/server/utils/serve_profiling.py
diff --git a/ktransformers/server/utils/sql_utils.py b/archive/ktransformers/server/utils/sql_utils.py
similarity index 100%
rename from ktransformers/server/utils/sql_utils.py
rename to archive/ktransformers/server/utils/sql_utils.py
diff --git a/ktransformers/tests/.gitignore b/archive/ktransformers/tests/.gitignore
similarity index 100%
rename from ktransformers/tests/.gitignore
rename to archive/ktransformers/tests/.gitignore
diff --git a/ktransformers/tests/AIME_2024/eval_api.py b/archive/ktransformers/tests/AIME_2024/eval_api.py
similarity index 100%
rename from ktransformers/tests/AIME_2024/eval_api.py
rename to archive/ktransformers/tests/AIME_2024/eval_api.py
diff --git a/ktransformers/tests/AIME_2024/evaluation.py b/archive/ktransformers/tests/AIME_2024/evaluation.py
similarity index 100%
rename from ktransformers/tests/AIME_2024/evaluation.py
rename to archive/ktransformers/tests/AIME_2024/evaluation.py
diff --git a/ktransformers/tests/AIME_2024/prompts.py b/archive/ktransformers/tests/AIME_2024/prompts.py
similarity index 100%
rename from ktransformers/tests/AIME_2024/prompts.py
rename to archive/ktransformers/tests/AIME_2024/prompts.py
diff --git a/ktransformers/tests/dequant_gpu.py b/archive/ktransformers/tests/dequant_gpu.py
similarity index 100%
rename from ktransformers/tests/dequant_gpu.py
rename to archive/ktransformers/tests/dequant_gpu.py
diff --git a/ktransformers/tests/dequant_gpu_t.py b/archive/ktransformers/tests/dequant_gpu_t.py
similarity index 100%
rename from ktransformers/tests/dequant_gpu_t.py
rename to archive/ktransformers/tests/dequant_gpu_t.py
diff --git a/ktransformers/tests/function_call_test.py b/archive/ktransformers/tests/function_call_test.py
similarity index 100%
rename from ktransformers/tests/function_call_test.py
rename to archive/ktransformers/tests/function_call_test.py
diff --git a/ktransformers/tests/humaneval/eval_api.py b/archive/ktransformers/tests/humaneval/eval_api.py
similarity index 100%
rename from ktransformers/tests/humaneval/eval_api.py
rename to archive/ktransformers/tests/humaneval/eval_api.py
diff --git a/ktransformers/tests/humaneval/evaluation.py b/archive/ktransformers/tests/humaneval/evaluation.py
similarity index 100%
rename from ktransformers/tests/humaneval/evaluation.py
rename to archive/ktransformers/tests/humaneval/evaluation.py
diff --git a/ktransformers/tests/humaneval/prompts.py b/archive/ktransformers/tests/humaneval/prompts.py
similarity index 100%
rename from ktransformers/tests/humaneval/prompts.py
rename to archive/ktransformers/tests/humaneval/prompts.py
diff --git a/ktransformers/tests/mmlu_pro_test.py b/archive/ktransformers/tests/mmlu_pro_test.py
similarity index 100%
rename from ktransformers/tests/mmlu_pro_test.py
rename to archive/ktransformers/tests/mmlu_pro_test.py
diff --git a/ktransformers/tests/mmlu_test.py b/archive/ktransformers/tests/mmlu_test.py
similarity index 100%
rename from ktransformers/tests/mmlu_test.py
rename to archive/ktransformers/tests/mmlu_test.py
diff --git a/ktransformers/tests/mmlu_test_multi.py b/archive/ktransformers/tests/mmlu_test_multi.py
similarity index 100%
rename from ktransformers/tests/mmlu_test_multi.py
rename to archive/ktransformers/tests/mmlu_test_multi.py
diff --git a/ktransformers/tests/score.py b/archive/ktransformers/tests/score.py
similarity index 100%
rename from ktransformers/tests/score.py
rename to archive/ktransformers/tests/score.py
diff --git a/ktransformers/tests/test_client.py b/archive/ktransformers/tests/test_client.py
similarity index 100%
rename from ktransformers/tests/test_client.py
rename to archive/ktransformers/tests/test_client.py
diff --git a/ktransformers/tests/test_prefix.py b/archive/ktransformers/tests/test_prefix.py
similarity index 100%
rename from ktransformers/tests/test_prefix.py
rename to archive/ktransformers/tests/test_prefix.py
diff --git a/ktransformers/tests/test_pytorch_q8.py b/archive/ktransformers/tests/test_pytorch_q8.py
similarity index 100%
rename from ktransformers/tests/test_pytorch_q8.py
rename to archive/ktransformers/tests/test_pytorch_q8.py
diff --git a/ktransformers/tests/test_speed.py b/archive/ktransformers/tests/test_speed.py
similarity index 100%
rename from ktransformers/tests/test_speed.py
rename to archive/ktransformers/tests/test_speed.py
diff --git a/ktransformers/tests/triton_fp8gemm_test.py b/archive/ktransformers/tests/triton_fp8gemm_test.py
similarity index 100%
rename from ktransformers/tests/triton_fp8gemm_test.py
rename to archive/ktransformers/tests/triton_fp8gemm_test.py
diff --git a/ktransformers/util/ascend/ascend_utils.py b/archive/ktransformers/util/ascend/ascend_utils.py
similarity index 100%
rename from ktransformers/util/ascend/ascend_utils.py
rename to archive/ktransformers/util/ascend/ascend_utils.py
diff --git a/ktransformers/util/cuda_graph_runner.py b/archive/ktransformers/util/cuda_graph_runner.py
similarity index 100%
rename from ktransformers/util/cuda_graph_runner.py
rename to archive/ktransformers/util/cuda_graph_runner.py
diff --git a/ktransformers/util/custom_gguf.py b/archive/ktransformers/util/custom_gguf.py
similarity index 100%
rename from ktransformers/util/custom_gguf.py
rename to archive/ktransformers/util/custom_gguf.py
diff --git a/ktransformers/util/custom_loader.py b/archive/ktransformers/util/custom_loader.py
similarity index 100%
rename from ktransformers/util/custom_loader.py
rename to archive/ktransformers/util/custom_loader.py
diff --git a/ktransformers/util/modeling_rope_utils.py b/archive/ktransformers/util/modeling_rope_utils.py
similarity index 100%
rename from ktransformers/util/modeling_rope_utils.py
rename to archive/ktransformers/util/modeling_rope_utils.py
diff --git a/ktransformers/util/npu_graph_runner.py b/archive/ktransformers/util/npu_graph_runner.py
similarity index 100%
rename from ktransformers/util/npu_graph_runner.py
rename to archive/ktransformers/util/npu_graph_runner.py
diff --git a/ktransformers/util/textstream.py b/archive/ktransformers/util/textstream.py
similarity index 100%
rename from ktransformers/util/textstream.py
rename to archive/ktransformers/util/textstream.py
diff --git a/ktransformers/util/utils.py b/archive/ktransformers/util/utils.py
similarity index 100%
rename from ktransformers/util/utils.py
rename to archive/ktransformers/util/utils.py
diff --git a/ktransformers/util/vendors.py b/archive/ktransformers/util/vendors.py
similarity index 100%
rename from ktransformers/util/vendors.py
rename to archive/ktransformers/util/vendors.py
diff --git a/ktransformers/util/weight_loader.py b/archive/ktransformers/util/weight_loader.py
similarity index 100%
rename from ktransformers/util/weight_loader.py
rename to archive/ktransformers/util/weight_loader.py
diff --git a/ktransformers/website/.browserslistrc b/archive/ktransformers/website/.browserslistrc
similarity index 100%
rename from ktransformers/website/.browserslistrc
rename to archive/ktransformers/website/.browserslistrc
diff --git a/ktransformers/website/.eslintrc.js b/archive/ktransformers/website/.eslintrc.js
similarity index 100%
rename from ktransformers/website/.eslintrc.js
rename to archive/ktransformers/website/.eslintrc.js
diff --git a/ktransformers/website/.gitignore b/archive/ktransformers/website/.gitignore
similarity index 100%
rename from ktransformers/website/.gitignore
rename to archive/ktransformers/website/.gitignore
diff --git a/ktransformers/website/README.md b/archive/ktransformers/website/README.md
similarity index 100%
rename from ktransformers/website/README.md
rename to archive/ktransformers/website/README.md
diff --git a/ktransformers/website/config.d.ts b/archive/ktransformers/website/config.d.ts
similarity index 100%
rename from ktransformers/website/config.d.ts
rename to archive/ktransformers/website/config.d.ts
diff --git a/ktransformers/website/jest.config.js b/archive/ktransformers/website/jest.config.js
similarity index 100%
rename from ktransformers/website/jest.config.js
rename to archive/ktransformers/website/jest.config.js
diff --git a/ktransformers/website/package-lock.json b/archive/ktransformers/website/package-lock.json
similarity index 100%
rename from ktransformers/website/package-lock.json
rename to archive/ktransformers/website/package-lock.json
diff --git a/ktransformers/website/package.json b/archive/ktransformers/website/package.json
similarity index 100%
rename from ktransformers/website/package.json
rename to archive/ktransformers/website/package.json
diff --git a/ktransformers/website/public/balck.ico b/archive/ktransformers/website/public/balck.ico
similarity index 100%
rename from ktransformers/website/public/balck.ico
rename to archive/ktransformers/website/public/balck.ico
diff --git a/ktransformers/website/public/config.js b/archive/ktransformers/website/public/config.js
similarity index 100%
rename from ktransformers/website/public/config.js
rename to archive/ktransformers/website/public/config.js
diff --git a/ktransformers/website/public/css/reset.css b/archive/ktransformers/website/public/css/reset.css
similarity index 100%
rename from ktransformers/website/public/css/reset.css
rename to archive/ktransformers/website/public/css/reset.css
diff --git a/ktransformers/website/public/images/assistant-avatar.png b/archive/ktransformers/website/public/images/assistant-avatar.png
similarity index 100%
rename from ktransformers/website/public/images/assistant-avatar.png
rename to archive/ktransformers/website/public/images/assistant-avatar.png
diff --git a/ktransformers/website/public/images/avatar.png b/archive/ktransformers/website/public/images/avatar.png
similarity index 100%
rename from ktransformers/website/public/images/avatar.png
rename to archive/ktransformers/website/public/images/avatar.png
diff --git a/ktransformers/website/public/images/bgbg.png b/archive/ktransformers/website/public/images/bgbg.png
similarity index 100%
rename from ktransformers/website/public/images/bgbg.png
rename to archive/ktransformers/website/public/images/bgbg.png
diff --git a/ktransformers/website/public/images/logo.ico b/archive/ktransformers/website/public/images/logo.ico
similarity index 100%
rename from ktransformers/website/public/images/logo.ico
rename to archive/ktransformers/website/public/images/logo.ico
diff --git a/ktransformers/website/public/images/logo.png b/archive/ktransformers/website/public/images/logo.png
similarity index 100%
rename from ktransformers/website/public/images/logo.png
rename to archive/ktransformers/website/public/images/logo.png
diff --git a/ktransformers/website/public/images/three.png b/archive/ktransformers/website/public/images/three.png
similarity index 100%
rename from ktransformers/website/public/images/three.png
rename to archive/ktransformers/website/public/images/three.png
diff --git a/ktransformers/website/public/images/user-filling.png b/archive/ktransformers/website/public/images/user-filling.png
similarity index 100%
rename from ktransformers/website/public/images/user-filling.png
rename to archive/ktransformers/website/public/images/user-filling.png
diff --git a/ktransformers/website/public/index.html b/archive/ktransformers/website/public/index.html
similarity index 100%
rename from ktransformers/website/public/index.html
rename to archive/ktransformers/website/public/index.html
diff --git a/ktransformers/website/src/App.vue b/archive/ktransformers/website/src/App.vue
similarity index 100%
rename from ktransformers/website/src/App.vue
rename to archive/ktransformers/website/src/App.vue
diff --git a/ktransformers/website/src/api/api-client.ts b/archive/ktransformers/website/src/api/api-client.ts
similarity index 100%
rename from ktransformers/website/src/api/api-client.ts
rename to archive/ktransformers/website/src/api/api-client.ts
diff --git a/ktransformers/website/src/api/assistant.ts b/archive/ktransformers/website/src/api/assistant.ts
similarity index 100%
rename from ktransformers/website/src/api/assistant.ts
rename to archive/ktransformers/website/src/api/assistant.ts
diff --git a/ktransformers/website/src/api/message.ts b/archive/ktransformers/website/src/api/message.ts
similarity index 100%
rename from ktransformers/website/src/api/message.ts
rename to archive/ktransformers/website/src/api/message.ts
diff --git a/ktransformers/website/src/api/run.ts b/archive/ktransformers/website/src/api/run.ts
similarity index 100%
rename from ktransformers/website/src/api/run.ts
rename to archive/ktransformers/website/src/api/run.ts
diff --git a/ktransformers/website/src/api/thread.ts b/archive/ktransformers/website/src/api/thread.ts
similarity index 100%
rename from ktransformers/website/src/api/thread.ts
rename to archive/ktransformers/website/src/api/thread.ts
diff --git a/ktransformers/website/src/assets/css/mixins.styl b/archive/ktransformers/website/src/assets/css/mixins.styl
similarity index 100%
rename from ktransformers/website/src/assets/css/mixins.styl
rename to archive/ktransformers/website/src/assets/css/mixins.styl
diff --git a/ktransformers/website/src/assets/iconfont/demo.css b/archive/ktransformers/website/src/assets/iconfont/demo.css
similarity index 100%
rename from ktransformers/website/src/assets/iconfont/demo.css
rename to archive/ktransformers/website/src/assets/iconfont/demo.css
diff --git a/ktransformers/website/src/assets/iconfont/demo_index.html b/archive/ktransformers/website/src/assets/iconfont/demo_index.html
similarity index 100%
rename from ktransformers/website/src/assets/iconfont/demo_index.html
rename to archive/ktransformers/website/src/assets/iconfont/demo_index.html
diff --git a/ktransformers/website/src/assets/iconfont/iconfont.css b/archive/ktransformers/website/src/assets/iconfont/iconfont.css
similarity index 100%
rename from ktransformers/website/src/assets/iconfont/iconfont.css
rename to archive/ktransformers/website/src/assets/iconfont/iconfont.css
diff --git a/ktransformers/website/src/assets/iconfont/iconfont.js b/archive/ktransformers/website/src/assets/iconfont/iconfont.js
similarity index 100%
rename from ktransformers/website/src/assets/iconfont/iconfont.js
rename to archive/ktransformers/website/src/assets/iconfont/iconfont.js
diff --git a/ktransformers/website/src/assets/iconfont/iconfont.json b/archive/ktransformers/website/src/assets/iconfont/iconfont.json
similarity index 100%
rename from ktransformers/website/src/assets/iconfont/iconfont.json
rename to archive/ktransformers/website/src/assets/iconfont/iconfont.json
diff --git a/ktransformers/website/src/assets/iconfont/iconfont.svg b/archive/ktransformers/website/src/assets/iconfont/iconfont.svg
similarity index 100%
rename from ktransformers/website/src/assets/iconfont/iconfont.svg
rename to archive/ktransformers/website/src/assets/iconfont/iconfont.svg
diff --git a/ktransformers/website/src/assets/iconfont/iconfont.ttf b/archive/ktransformers/website/src/assets/iconfont/iconfont.ttf
similarity index 100%
rename from ktransformers/website/src/assets/iconfont/iconfont.ttf
rename to archive/ktransformers/website/src/assets/iconfont/iconfont.ttf
diff --git a/ktransformers/website/src/assets/iconfont/iconfont.woff b/archive/ktransformers/website/src/assets/iconfont/iconfont.woff
similarity index 100%
rename from ktransformers/website/src/assets/iconfont/iconfont.woff
rename to archive/ktransformers/website/src/assets/iconfont/iconfont.woff
diff --git a/ktransformers/website/src/assets/iconfont/iconfont.woff2 b/archive/ktransformers/website/src/assets/iconfont/iconfont.woff2
similarity index 100%
rename from ktransformers/website/src/assets/iconfont/iconfont.woff2
rename to archive/ktransformers/website/src/assets/iconfont/iconfont.woff2
diff --git a/ktransformers/website/src/components/chat/index.vue b/archive/ktransformers/website/src/components/chat/index.vue
similarity index 100%
rename from ktransformers/website/src/components/chat/index.vue
rename to archive/ktransformers/website/src/components/chat/index.vue
diff --git a/ktransformers/website/src/conf/config.ts b/archive/ktransformers/website/src/conf/config.ts
similarity index 100%
rename from ktransformers/website/src/conf/config.ts
rename to archive/ktransformers/website/src/conf/config.ts
diff --git a/ktransformers/website/src/locals/en.js b/archive/ktransformers/website/src/locals/en.js
similarity index 100%
rename from ktransformers/website/src/locals/en.js
rename to archive/ktransformers/website/src/locals/en.js
diff --git a/ktransformers/website/src/locals/index.js b/archive/ktransformers/website/src/locals/index.js
similarity index 100%
rename from ktransformers/website/src/locals/index.js
rename to archive/ktransformers/website/src/locals/index.js
diff --git a/ktransformers/website/src/locals/zh.js b/archive/ktransformers/website/src/locals/zh.js
similarity index 100%
rename from ktransformers/website/src/locals/zh.js
rename to archive/ktransformers/website/src/locals/zh.js
diff --git a/ktransformers/website/src/main.ts b/archive/ktransformers/website/src/main.ts
similarity index 100%
rename from ktransformers/website/src/main.ts
rename to archive/ktransformers/website/src/main.ts
diff --git a/ktransformers/website/src/router/index.ts b/archive/ktransformers/website/src/router/index.ts
similarity index 100%
rename from ktransformers/website/src/router/index.ts
rename to archive/ktransformers/website/src/router/index.ts
diff --git a/ktransformers/website/src/shims-vue.d.ts b/archive/ktransformers/website/src/shims-vue.d.ts
similarity index 100%
rename from ktransformers/website/src/shims-vue.d.ts
rename to archive/ktransformers/website/src/shims-vue.d.ts
diff --git a/ktransformers/website/src/store/index.ts b/archive/ktransformers/website/src/store/index.ts
similarity index 100%
rename from ktransformers/website/src/store/index.ts
rename to archive/ktransformers/website/src/store/index.ts
diff --git a/ktransformers/website/src/utils/copy.ts b/archive/ktransformers/website/src/utils/copy.ts
similarity index 100%
rename from ktransformers/website/src/utils/copy.ts
rename to archive/ktransformers/website/src/utils/copy.ts
diff --git a/ktransformers/website/src/utils/types.ts b/archive/ktransformers/website/src/utils/types.ts
similarity index 100%
rename from ktransformers/website/src/utils/types.ts
rename to archive/ktransformers/website/src/utils/types.ts
diff --git a/ktransformers/website/src/views/home.vue b/archive/ktransformers/website/src/views/home.vue
similarity index 100%
rename from ktransformers/website/src/views/home.vue
rename to archive/ktransformers/website/src/views/home.vue
diff --git a/ktransformers/website/tests/unit/example.spec.ts b/archive/ktransformers/website/tests/unit/example.spec.ts
similarity index 100%
rename from ktransformers/website/tests/unit/example.spec.ts
rename to archive/ktransformers/website/tests/unit/example.spec.ts
diff --git a/ktransformers/website/tsconfig.json b/archive/ktransformers/website/tsconfig.json
similarity index 100%
rename from ktransformers/website/tsconfig.json
rename to archive/ktransformers/website/tsconfig.json
diff --git a/ktransformers/website/vue.config.js b/archive/ktransformers/website/vue.config.js
similarity index 100%
rename from ktransformers/website/vue.config.js
rename to archive/ktransformers/website/vue.config.js
diff --git a/merge_tensors/merge_safetensor_gguf.py b/archive/merge_tensors/merge_safetensor_gguf.py
similarity index 100%
rename from merge_tensors/merge_safetensor_gguf.py
rename to archive/merge_tensors/merge_safetensor_gguf.py
diff --git a/pyproject.toml b/archive/pyproject.toml
similarity index 100%
rename from pyproject.toml
rename to archive/pyproject.toml
diff --git a/requirements-local_chat.txt b/archive/requirements-local_chat.txt
similarity index 100%
rename from requirements-local_chat.txt
rename to archive/requirements-local_chat.txt
diff --git a/setup.py b/archive/setup.py
similarity index 100%
rename from setup.py
rename to archive/setup.py
diff --git a/third_party/PhotonLibOS b/archive/third_party/PhotonLibOS
similarity index 100%
rename from third_party/PhotonLibOS
rename to archive/third_party/PhotonLibOS
diff --git a/third_party/custom_flashinfer b/archive/third_party/custom_flashinfer
similarity index 100%
rename from third_party/custom_flashinfer
rename to archive/third_party/custom_flashinfer
diff --git a/third_party/llama.cpp b/archive/third_party/llama.cpp
similarity index 100%
rename from third_party/llama.cpp
rename to archive/third_party/llama.cpp
diff --git a/third_party/llamafile/README.md b/archive/third_party/llamafile/README.md
similarity index 100%
rename from third_party/llamafile/README.md
rename to archive/third_party/llamafile/README.md
diff --git a/third_party/llamafile/bench.h b/archive/third_party/llamafile/bench.h
similarity index 100%
rename from third_party/llamafile/bench.h
rename to archive/third_party/llamafile/bench.h
diff --git a/third_party/llamafile/flags.cpp b/archive/third_party/llamafile/flags.cpp
similarity index 100%
rename from third_party/llamafile/flags.cpp
rename to archive/third_party/llamafile/flags.cpp
diff --git a/third_party/llamafile/flags.h b/archive/third_party/llamafile/flags.h
similarity index 100%
rename from third_party/llamafile/flags.h
rename to archive/third_party/llamafile/flags.h
diff --git a/third_party/llamafile/iqk_mul_mat.inc b/archive/third_party/llamafile/iqk_mul_mat.inc
similarity index 100%
rename from third_party/llamafile/iqk_mul_mat.inc
rename to archive/third_party/llamafile/iqk_mul_mat.inc
diff --git a/third_party/llamafile/iqk_mul_mat_amd_avx2.cpp b/archive/third_party/llamafile/iqk_mul_mat_amd_avx2.cpp
similarity index 100%
rename from third_party/llamafile/iqk_mul_mat_amd_avx2.cpp
rename to archive/third_party/llamafile/iqk_mul_mat_amd_avx2.cpp
diff --git a/third_party/llamafile/iqk_mul_mat_amd_zen4.cpp b/archive/third_party/llamafile/iqk_mul_mat_amd_zen4.cpp
similarity index 100%
rename from third_party/llamafile/iqk_mul_mat_amd_zen4.cpp
rename to archive/third_party/llamafile/iqk_mul_mat_amd_zen4.cpp
diff --git a/third_party/llamafile/iqk_mul_mat_arm.inc b/archive/third_party/llamafile/iqk_mul_mat_arm.inc
similarity index 100%
rename from third_party/llamafile/iqk_mul_mat_arm.inc
rename to archive/third_party/llamafile/iqk_mul_mat_arm.inc
diff --git a/third_party/llamafile/iqk_mul_mat_arm82.cpp b/archive/third_party/llamafile/iqk_mul_mat_arm82.cpp
similarity index 100%
rename from third_party/llamafile/iqk_mul_mat_arm82.cpp
rename to archive/third_party/llamafile/iqk_mul_mat_arm82.cpp
diff --git a/third_party/llamafile/iqk_mul_mat_x86.inc b/archive/third_party/llamafile/iqk_mul_mat_x86.inc
similarity index 100%
rename from third_party/llamafile/iqk_mul_mat_x86.inc
rename to archive/third_party/llamafile/iqk_mul_mat_x86.inc
diff --git a/third_party/llamafile/macros.h b/archive/third_party/llamafile/macros.h
similarity index 100%
rename from third_party/llamafile/macros.h
rename to archive/third_party/llamafile/macros.h
diff --git a/third_party/llamafile/micros.h b/archive/third_party/llamafile/micros.h
similarity index 100%
rename from third_party/llamafile/micros.h
rename to archive/third_party/llamafile/micros.h
diff --git a/third_party/llamafile/numba.h b/archive/third_party/llamafile/numba.h
similarity index 100%
rename from third_party/llamafile/numba.h
rename to archive/third_party/llamafile/numba.h
diff --git a/third_party/llamafile/sgemm.cpp b/archive/third_party/llamafile/sgemm.cpp
similarity index 100%
rename from third_party/llamafile/sgemm.cpp
rename to archive/third_party/llamafile/sgemm.cpp
diff --git a/third_party/llamafile/sgemm.h b/archive/third_party/llamafile/sgemm.h
similarity index 100%
rename from third_party/llamafile/sgemm.h
rename to archive/third_party/llamafile/sgemm.h
diff --git a/third_party/llamafile/sgemm_arm.cpp b/archive/third_party/llamafile/sgemm_arm.cpp
similarity index 100%
rename from third_party/llamafile/sgemm_arm.cpp
rename to archive/third_party/llamafile/sgemm_arm.cpp
diff --git a/third_party/llamafile/sgemm_x86.cpp b/archive/third_party/llamafile/sgemm_x86.cpp
similarity index 100%
rename from third_party/llamafile/sgemm_x86.cpp
rename to archive/third_party/llamafile/sgemm_x86.cpp
diff --git a/third_party/llamafile/tinyblas_cpu.h b/archive/third_party/llamafile/tinyblas_cpu.h
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu.h
rename to archive/third_party/llamafile/tinyblas_cpu.h
diff --git a/third_party/llamafile/tinyblas_cpu_mixmul.inc b/archive/third_party/llamafile/tinyblas_cpu_mixmul.inc
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu_mixmul.inc
rename to archive/third_party/llamafile/tinyblas_cpu_mixmul.inc
diff --git a/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx.cpp b/archive/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx.cpp
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu_mixmul_amd_avx.cpp
rename to archive/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx.cpp
diff --git a/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx2.cpp b/archive/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx2.cpp
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu_mixmul_amd_avx2.cpp
rename to archive/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx2.cpp
diff --git a/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx512f.cpp b/archive/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx512f.cpp
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu_mixmul_amd_avx512f.cpp
rename to archive/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx512f.cpp
diff --git a/third_party/llamafile/tinyblas_cpu_mixmul_amd_avxvnni.cpp b/archive/third_party/llamafile/tinyblas_cpu_mixmul_amd_avxvnni.cpp
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu_mixmul_amd_avxvnni.cpp
rename to archive/third_party/llamafile/tinyblas_cpu_mixmul_amd_avxvnni.cpp
diff --git a/third_party/llamafile/tinyblas_cpu_mixmul_amd_fma.cpp b/archive/third_party/llamafile/tinyblas_cpu_mixmul_amd_fma.cpp
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu_mixmul_amd_fma.cpp
rename to archive/third_party/llamafile/tinyblas_cpu_mixmul_amd_fma.cpp
diff --git a/third_party/llamafile/tinyblas_cpu_mixmul_amd_zen4.cpp b/archive/third_party/llamafile/tinyblas_cpu_mixmul_amd_zen4.cpp
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu_mixmul_amd_zen4.cpp
rename to archive/third_party/llamafile/tinyblas_cpu_mixmul_amd_zen4.cpp
diff --git a/third_party/llamafile/tinyblas_cpu_mixmul_arm80.cpp b/archive/third_party/llamafile/tinyblas_cpu_mixmul_arm80.cpp
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu_mixmul_arm80.cpp
rename to archive/third_party/llamafile/tinyblas_cpu_mixmul_arm80.cpp
diff --git a/third_party/llamafile/tinyblas_cpu_mixmul_arm82.cpp b/archive/third_party/llamafile/tinyblas_cpu_mixmul_arm82.cpp
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu_mixmul_arm82.cpp
rename to archive/third_party/llamafile/tinyblas_cpu_mixmul_arm82.cpp
diff --git a/third_party/llamafile/tinyblas_cpu_sgemm.inc b/archive/third_party/llamafile/tinyblas_cpu_sgemm.inc
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu_sgemm.inc
rename to archive/third_party/llamafile/tinyblas_cpu_sgemm.inc
diff --git a/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx.cpp b/archive/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx.cpp
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu_sgemm_amd_avx.cpp
rename to archive/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx.cpp
diff --git a/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx2.cpp b/archive/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx2.cpp
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu_sgemm_amd_avx2.cpp
rename to archive/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx2.cpp
diff --git a/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx512f.cpp b/archive/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx512f.cpp
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu_sgemm_amd_avx512f.cpp
rename to archive/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx512f.cpp
diff --git a/third_party/llamafile/tinyblas_cpu_sgemm_amd_avxvnni.cpp b/archive/third_party/llamafile/tinyblas_cpu_sgemm_amd_avxvnni.cpp
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu_sgemm_amd_avxvnni.cpp
rename to archive/third_party/llamafile/tinyblas_cpu_sgemm_amd_avxvnni.cpp
diff --git a/third_party/llamafile/tinyblas_cpu_sgemm_amd_fma.cpp b/archive/third_party/llamafile/tinyblas_cpu_sgemm_amd_fma.cpp
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu_sgemm_amd_fma.cpp
rename to archive/third_party/llamafile/tinyblas_cpu_sgemm_amd_fma.cpp
diff --git a/third_party/llamafile/tinyblas_cpu_sgemm_amd_zen4.cpp b/archive/third_party/llamafile/tinyblas_cpu_sgemm_amd_zen4.cpp
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu_sgemm_amd_zen4.cpp
rename to archive/third_party/llamafile/tinyblas_cpu_sgemm_amd_zen4.cpp
diff --git a/third_party/llamafile/tinyblas_cpu_sgemm_arm.inc b/archive/third_party/llamafile/tinyblas_cpu_sgemm_arm.inc
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu_sgemm_arm.inc
rename to archive/third_party/llamafile/tinyblas_cpu_sgemm_arm.inc
diff --git a/third_party/llamafile/tinyblas_cpu_sgemm_arm80.cpp b/archive/third_party/llamafile/tinyblas_cpu_sgemm_arm80.cpp
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu_sgemm_arm80.cpp
rename to archive/third_party/llamafile/tinyblas_cpu_sgemm_arm80.cpp
diff --git a/third_party/llamafile/tinyblas_cpu_sgemm_arm82.cpp b/archive/third_party/llamafile/tinyblas_cpu_sgemm_arm82.cpp
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu_sgemm_arm82.cpp
rename to archive/third_party/llamafile/tinyblas_cpu_sgemm_arm82.cpp
diff --git a/third_party/llamafile/tinyblas_cpu_sgemm_x86.inc b/archive/third_party/llamafile/tinyblas_cpu_sgemm_x86.inc
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu_sgemm_x86.inc
rename to archive/third_party/llamafile/tinyblas_cpu_sgemm_x86.inc
diff --git a/third_party/llamafile/tinyblas_cpu_unsupported.cpp b/archive/third_party/llamafile/tinyblas_cpu_unsupported.cpp
similarity index 100%
rename from third_party/llamafile/tinyblas_cpu_unsupported.cpp
rename to archive/third_party/llamafile/tinyblas_cpu_unsupported.cpp
diff --git a/third_party/nlohmann/json.hpp b/archive/third_party/nlohmann/json.hpp
similarity index 100%
rename from third_party/nlohmann/json.hpp
rename to archive/third_party/nlohmann/json.hpp
diff --git a/third_party/nlohmann/json_fwd.hpp b/archive/third_party/nlohmann/json_fwd.hpp
similarity index 100%
rename from third_party/nlohmann/json_fwd.hpp
rename to archive/third_party/nlohmann/json_fwd.hpp
diff --git a/third_party/prometheus-cpp b/archive/third_party/prometheus-cpp
similarity index 100%
rename from third_party/prometheus-cpp
rename to archive/third_party/prometheus-cpp
diff --git a/third_party/pybind11 b/archive/third_party/pybind11
similarity index 100%
rename from third_party/pybind11
rename to archive/third_party/pybind11
diff --git a/third_party/spdlog b/archive/third_party/spdlog
similarity index 100%
rename from third_party/spdlog
rename to archive/third_party/spdlog
diff --git a/third_party/xxHash b/archive/third_party/xxHash
similarity index 100%
rename from third_party/xxHash
rename to archive/third_party/xxHash
diff --git a/doc/assets/heterogeneous_computing.png b/doc/assets/heterogeneous_computing.png
new file mode 100644
index 0000000..c90717b
Binary files /dev/null and b/doc/assets/heterogeneous_computing.png differ