mirror of https://github.com/kvcache-ai/ktransformers.git synced 2026-04-20 14:29:22 +00:00

Files

Jianwei Dong 1f79f6da92 [feat](kt-kernel): Add automatic deployment workflow (#1719 )

2025-12-16 15:20:06 +08:00

9.8 KiB

Raw Blame History

KTransformers Docker Packaging Guide

This directory contains scripts for building and distributing KTransformers Docker images with standardized naming conventions.

Overview

The packaging system provides:

Automated version detection from sglang, ktransformers, and LLaMA-Factory
Multi-CPU variant support (AMX, AVX512, AVX2) with runtime auto-detection
Standardized naming convention for easy identification and management
Two distribution methods:
- Local tar file export for offline distribution
- DockerHub publishing for online distribution

Naming Convention

Docker images follow this naming pattern:

sglang-v{sglang版本}_ktransformers-v{ktransformers版本}_{cpu信息}_{gpu信息}_{功能模式}_{时间戳}

Example Names

Tar file:

sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022.tar

DockerHub tags:

Full tag:
kvcache/ktransformers:sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022

Simplified tag:
kvcache/ktransformers:v0.4.3-cu128

Name Components

Component	Description	Example
sglang version	SGLang package version	`v0.5.6`
ktransformers version	KTransformers version	`v0.4.3`
cpu info	CPU instruction set support	`x86-intel-multi` (includes AMX/AVX512/AVX2)
gpu info	CUDA version	`cu128` (CUDA 12.8)
functionality	Feature mode	`sft_llamafactory-v0.9.3` or `infer`
timestamp	Build time (Beijing/UTC+8)	`20241212143022`

Files

File	Purpose
`Dockerfile`	Main Dockerfile with multi-CPU build and version extraction
`docker-utils.sh`	Shared utility functions for both scripts
`build-docker-tar.sh`	Build and export Docker image to tar file
`push-to-dockerhub.sh`	Build and push Docker image to DockerHub

Prerequisites

Docker installed and running
For DockerHub push: Docker Hub account and login (docker login)
Sufficient disk space (at least 20GB recommended)
Internet access (or local mirrors configured)

Quick Start

Build Local Tar File

cd docker

# Basic build
./build-docker-tar.sh

# With specific CUDA version and mirror
./build-docker-tar.sh \
  --cuda-version 12.8.1 \
  --ubuntu-mirror 1

# With proxy
./build-docker-tar.sh \
  --cuda-version 12.8.1 \
  --ubuntu-mirror 1 \
  --http-proxy "http://127.0.0.1:16981" \
  --https-proxy "http://127.0.0.1:16981" \
  --output-dir /path/to/output

Push to DockerHub

cd docker

# Basic push (requires --repository)
./push-to-dockerhub.sh \
  --repository kvcache/ktransformers

# With simplified tag
./push-to-dockerhub.sh \
  --cuda-version 12.8.1 \
  --repository kvcache/ktransformers \
  --also-push-simplified

# Skip build if image exists
./push-to-dockerhub.sh \
  --repository kvcache/ktransformers \
  --skip-build

Script Options

build-docker-tar.sh

Build Configuration:
  --cuda-version VERSION       CUDA version (default: 12.8.1)
  --ubuntu-mirror 0|1         Use Tsinghua mirror (default: 0)
  --http-proxy URL            HTTP proxy URL
  --https-proxy URL           HTTPS proxy URL
  --cpu-variant VARIANT       CPU variant (default: x86-intel-multi)
  --functionality TYPE        Mode: sft or infer (default: sft)

Paths:
  --dockerfile PATH           Path to Dockerfile (default: ./Dockerfile)
  --context-dir PATH          Build context directory (default: .)
  --output-dir PATH           Output directory for tar (default: .)

Options:
  --dry-run                   Preview without building
  --keep-image                Keep Docker image after export
  --build-arg KEY=VALUE       Additional build arguments
  -h, --help                  Show help message

push-to-dockerhub.sh

All options from build-docker-tar.sh, plus:

Registry Settings:
  --registry REGISTRY         Docker registry (default: docker.io)
  --repository REPO           Repository name (REQUIRED)

Options:
  --skip-build                Skip build if image exists
  --also-push-simplified      Also push simplified tag
  --max-retries N             Max push retries (default: 3)
  --retry-delay SECONDS       Delay between retries (default: 5)

Usage Examples

Example 1: Local Development Build

For testing on your local machine:

./build-docker-tar.sh \
  --cuda-version 12.8.1 \
  --output-dir ./builds \
  --keep-image

This will:

Build the Docker image
Export to tar in ./builds/ directory
Keep the Docker image for local testing

Example 2: Production Build for Distribution

For creating a production build with mirrors and proxy:

./build-docker-tar.sh \
  --cuda-version 12.8.1 \
  --ubuntu-mirror 1 \
  --http-proxy "http://127.0.0.1:16981" \
  --https-proxy "http://127.0.0.1:16981" \
  --output-dir /mnt/data/releases

Example 3: Publish to DockerHub

For publishing to DockerHub:

# First, login to Docker Hub
docker login

# Then push
./push-to-dockerhub.sh \
  --cuda-version 12.8.1 \
  --repository kvcache/ktransformers \
  --also-push-simplified

This creates two tags:

Full: kvcache/ktransformers:sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022
Simplified: kvcache/ktransformers:v0.4.3-cu128

Example 4: Dry Run

Preview the build without actually building:

./build-docker-tar.sh --cuda-version 12.8.1 --dry-run

Example 5: Custom Build Arguments

Pass additional Docker build arguments:

./build-docker-tar.sh \
  --cuda-version 12.8.1 \
  --build-arg SGL_VERSION=0.5.7 \
  --build-arg FLASHINFER_VERSION=0.5.4

Using the Built Images

Load from Tar File

# Load the image
docker load -i sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022.tar

# Run the container
docker run -it --rm \
  --gpus all \
  sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022 \
  /bin/bash

Pull from DockerHub

# Pull with full tag
docker pull kvcache/ktransformers:sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022

# Or pull with simplified tag
docker pull kvcache/ktransformers:v0.4.3-cu128

# Run the container
docker run -it --rm \
  --gpus all \
  kvcache/ktransformers:v0.4.3-cu128 \
  /bin/bash

Inside the Container

The image contains two conda environments:

# Activate serve environment (for inference with sglang)
conda activate serve
# or use the alias:
serve

# Activate fine-tune environment (for training with LLaMA-Factory)
conda activate fine-tune
# or use the alias:
finetune

Multi-CPU Variant Support

The Docker image includes all three CPU variants:

AMX - For Intel Sapphire Rapids and newer (4th Gen Xeon+)
AVX512 - For Intel Skylake-X, Ice Lake, Cascade Lake
AVX2 - Maximum compatibility for older CPUs

The runtime automatically detects your CPU and loads the appropriate variant. To override:

# Force use of AVX2 variant
export KT_KERNEL_CPU_VARIANT=avx2
python your_script.py

# Enable debug output to see which variant is loaded
export KT_KERNEL_DEBUG=1
python your_script.py

Version Extraction

Versions are automatically extracted during Docker build from:

SGLang: From sglang.__version__ in serve environment
KTransformers: From version.py in ktransformers repository
LLaMA-Factory: From llamafactory.__version__ in fine-tune environment

The versions are saved to /workspace/versions.env in the image:

# View versions in running container
cat /workspace/versions.env

# Output:
SGLANG_VERSION=0.5.6
KTRANSFORMERS_VERSION=0.4.3
LLAMAFACTORY_VERSION=0.9.3

Troubleshooting

Build Fails with Out of Disk Space

Check available disk space:

df -h

The build requires approximately 15-20GB of disk space. Clean up Docker:

docker system prune -a

Version Extraction Fails

If version extraction fails (shows "unknown"), check:

The cloned repositories have the correct branches
Python packages are properly installed in conda environments
Version files exist in expected locations

You can manually verify by running:

docker run --rm <image> /bin/bash -c "
  source /opt/miniconda3/etc/profile.d/conda.sh &&
  conda activate serve &&
  python -c 'import sglang; print(sglang.__version__)'
"

Push to DockerHub Fails

Check login: docker login
Check repository name: Must include namespace (e.g., kvcache/ktransformers, not just ktransformers)
Network issues: Use --max-retries and --retry-delay options
Rate limiting: DockerHub has pull/push rate limits for free accounts

Advanced Topics

Custom Dockerfile Location

./build-docker-tar.sh \
  --dockerfile /path/to/custom/Dockerfile \
  --context-dir /path/to/build/context

Building Only Inference Image (Future)

Currently, the image always includes both serve and fine-tune environments. To create an inference-only image, modify the Dockerfile to skip the fine-tune environment section.

Customizing CPU Variants

To build only specific CPU variants, modify kt-kernel/install.sh or set environment variables in the Dockerfile.

CI/CD Integration

The scripts are designed for manual execution but can be integrated into CI/CD pipelines:

# Example GitHub Actions workflow
- name: Build and push Docker image
  run: |
    cd docker
    ./push-to-dockerhub.sh \
      --cuda-version ${{ matrix.cuda_version }} \
      --repository ${{ secrets.DOCKER_REPOSITORY }} \
      --also-push-simplified

Support

For issues and questions:

File an issue at: https://github.com/kvcache-ai/ktransformers/issues
Check documentation: https://github.com/kvcache-ai/ktransformers

License

This packaging system is part of KTransformers and follows the same license.

9.8 KiB Raw Blame History