Files
ktransformers/archive/Dockerfile
Jiaqi Liao 57d14d22bc Refactor: restructure repository to focus on kt-kernel and KT-SFT modulesq recon (#1581)
* refactor: move legacy code to archive/ directory

  - Moved ktransformers, csrc, third_party, merge_tensors to archive/
  - Moved build scripts and configurations to archive/
  - Kept kt-kernel, KT-SFT, doc, and README files in root
  - Preserved complete git history for all moved files

* refactor: restructure repository to focus on kt-kernel and KT-SFT modules

* fix README

* fix README

* fix README

* fix README

* docs: add performance benchmarks to kt-kernel section

Add comprehensive performance data for kt-kernel to match KT-SFT's presentation:
- AMX kernel optimization: 21.3 TFLOPS (3.9× faster than PyTorch)
- Prefill phase: up to 20× speedup vs baseline
- Decode phase: up to 4× speedup
- NUMA optimization: up to 63% throughput improvement
- Multi-GPU (8×L20): 227.85 tokens/s total throughput with DeepSeek-R1 FP8

Source: https://lmsys.org/blog/2025-10-22-KTransformers/

This provides users with concrete performance metrics for both core modules,
making it easier to understand the capabilities of each component.

* refactor: improve kt-kernel performance data with specific hardware and models

Replace generic performance descriptions with concrete benchmarks:
- Specify exact hardware: 8×L20 GPU + Xeon Gold 6454S, Single/Dual-socket Xeon + AMX
- Include specific models: DeepSeek-R1-0528 (FP8), DeepSeek-V3 (671B)
- Show detailed metrics: total throughput, output throughput, concurrency details
- Match KT-SFT presentation style for consistency

This provides users with actionable performance data they can use to evaluate
hardware requirements and expected performance for their use cases.

* fix README

* docs: clean up performance table and improve formatting

* add pic for README

* refactor: simplify .gitmodules and backup legacy submodules

- Remove 7 legacy submodules from root .gitmodules (archive/third_party/*)
- Keep only 2 active submodules for kt-kernel (llama.cpp, pybind11)
- Backup complete .gitmodules to archive/.gitmodules
- Add documentation in archive/README.md for researchers who need legacy submodules

This reduces initial clone size by ~500MB and avoids downloading unused dependencies.

* refactor: move doc/ back to root directory

Keep documentation in root for easier access and maintenance.

* refactor: consolidate all images to doc/assets/

- Move kt-kernel/assets/heterogeneous_computing.png to doc/assets/
- Remove KT-SFT/assets/ (images already in doc/assets/)
- Update KT-SFT/README.md image references to ../doc/assets/
- Eliminates ~7.9MB image duplication
- Centralizes all documentation assets in one location

* fix pic path for README
2025-11-10 17:42:26 +08:00

64 lines
1.4 KiB
Docker
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
FROM pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel as compile_server
ARG CPU_INSTRUCT=NATIVE
# 设置工作目录和 CUDA 路径
WORKDIR /workspace
ENV CUDA_HOME=/usr/local/cuda
# 安装依赖
RUN apt update -y
RUN apt install -y --no-install-recommends \
libtbb-dev \
libssl-dev \
libcurl4-openssl-dev \
libaio1 \
libaio-dev \
libfmt-dev \
libgflags-dev \
zlib1g-dev \
patchelf \
git \
wget \
vim \
gcc \
g++ \
cmake
# 拷贝代码
RUN git clone https://github.com/kvcache-ai/ktransformers.git
# 清理 apt 缓存
RUN rm -rf /var/lib/apt/lists/*
# 进入项目目录
WORKDIR /workspace/ktransformers
# 初始化子模块
RUN git submodule update --init --recursive
# 升级 pip
RUN pip install --upgrade pip
# 安装构建依赖
RUN pip install ninja pyproject numpy cpufeature aiohttp zmq openai
# 安装 flash-attn提前装可以避免后续某些编译依赖出错
RUN pip install flash-attn
# 安装 ktransformers 本体(含编译)
RUN CPU_INSTRUCT=${CPU_INSTRUCT} \
USE_BALANCE_SERVE=1 \
KTRANSFORMERS_FORCE_BUILD=TRUE \
TORCH_CUDA_ARCH_LIST="8.0;8.6;8.7;8.9;9.0+PTX" \
pip install . --no-build-isolation --verbose
RUN pip install third_party/custom_flashinfer/
# 清理 pip 缓存
RUN pip cache purge
# 拷贝 C++ 运行时库
RUN cp /usr/lib/x86_64-linux-gnu/libstdc++.so.6 /opt/conda/lib/
# 保持容器运行(调试用)
ENTRYPOINT ["tail", "-f", "/dev/null"]