9.8 KiB
KTransformers Docker Packaging Guide
This directory contains scripts for building and distributing KTransformers Docker images with standardized naming conventions.
Overview
The packaging system provides:
- Automated version detection from sglang, ktransformers, and LLaMA-Factory
- Multi-CPU variant support (AMX, AVX512, AVX2) with runtime auto-detection
- Standardized naming convention for easy identification and management
- Two distribution methods:
- Local tar file export for offline distribution
- DockerHub publishing for online distribution
Naming Convention
Docker images follow this naming pattern:
sglang-v{sglang版本}_ktransformers-v{ktransformers版本}_{cpu信息}_{gpu信息}_{功能模式}_{时间戳}
Example Names
Tar file:
sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022.tar
DockerHub tags:
Full tag:
kvcache/ktransformers:sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022
Simplified tag:
kvcache/ktransformers:v0.4.3-cu128
Name Components
| Component | Description | Example |
|---|---|---|
| sglang version | SGLang package version | v0.5.6 |
| ktransformers version | KTransformers version | v0.4.3 |
| cpu info | CPU instruction set support | x86-intel-multi (includes AMX/AVX512/AVX2) |
| gpu info | CUDA version | cu128 (CUDA 12.8) |
| functionality | Feature mode | sft_llamafactory-v0.9.3 or infer |
| timestamp | Build time (Beijing/UTC+8) | 20241212143022 |
Files
| File | Purpose |
|---|---|
Dockerfile |
Main Dockerfile with multi-CPU build and version extraction |
docker-utils.sh |
Shared utility functions for both scripts |
build-docker-tar.sh |
Build and export Docker image to tar file |
push-to-dockerhub.sh |
Build and push Docker image to DockerHub |
Prerequisites
- Docker installed and running
- For DockerHub push: Docker Hub account and login (
docker login) - Sufficient disk space (at least 20GB recommended)
- Internet access (or local mirrors configured)
Quick Start
Build Local Tar File
cd docker
# Basic build
./build-docker-tar.sh
# With specific CUDA version and mirror
./build-docker-tar.sh \
--cuda-version 12.8.1 \
--ubuntu-mirror 1
# With proxy
./build-docker-tar.sh \
--cuda-version 12.8.1 \
--ubuntu-mirror 1 \
--http-proxy "http://127.0.0.1:16981" \
--https-proxy "http://127.0.0.1:16981" \
--output-dir /path/to/output
Push to DockerHub
cd docker
# Basic push (requires --repository)
./push-to-dockerhub.sh \
--repository kvcache/ktransformers
# With simplified tag
./push-to-dockerhub.sh \
--cuda-version 12.8.1 \
--repository kvcache/ktransformers \
--also-push-simplified
# Skip build if image exists
./push-to-dockerhub.sh \
--repository kvcache/ktransformers \
--skip-build
Script Options
build-docker-tar.sh
Build Configuration:
--cuda-version VERSION CUDA version (default: 12.8.1)
--ubuntu-mirror 0|1 Use Tsinghua mirror (default: 0)
--http-proxy URL HTTP proxy URL
--https-proxy URL HTTPS proxy URL
--cpu-variant VARIANT CPU variant (default: x86-intel-multi)
--functionality TYPE Mode: sft or infer (default: sft)
Paths:
--dockerfile PATH Path to Dockerfile (default: ./Dockerfile)
--context-dir PATH Build context directory (default: .)
--output-dir PATH Output directory for tar (default: .)
Options:
--dry-run Preview without building
--keep-image Keep Docker image after export
--build-arg KEY=VALUE Additional build arguments
-h, --help Show help message
push-to-dockerhub.sh
All options from build-docker-tar.sh, plus:
Registry Settings:
--registry REGISTRY Docker registry (default: docker.io)
--repository REPO Repository name (REQUIRED)
Options:
--skip-build Skip build if image exists
--also-push-simplified Also push simplified tag
--max-retries N Max push retries (default: 3)
--retry-delay SECONDS Delay between retries (default: 5)
Usage Examples
Example 1: Local Development Build
For testing on your local machine:
./build-docker-tar.sh \
--cuda-version 12.8.1 \
--output-dir ./builds \
--keep-image
This will:
- Build the Docker image
- Export to tar in
./builds/directory - Keep the Docker image for local testing
Example 2: Production Build for Distribution
For creating a production build with mirrors and proxy:
./build-docker-tar.sh \
--cuda-version 12.8.1 \
--ubuntu-mirror 1 \
--http-proxy "http://127.0.0.1:16981" \
--https-proxy "http://127.0.0.1:16981" \
--output-dir /mnt/data/releases
Example 3: Publish to DockerHub
For publishing to DockerHub:
# First, login to Docker Hub
docker login
# Then push
./push-to-dockerhub.sh \
--cuda-version 12.8.1 \
--repository kvcache/ktransformers \
--also-push-simplified
This creates two tags:
- Full:
kvcache/ktransformers:sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022 - Simplified:
kvcache/ktransformers:v0.4.3-cu128
Example 4: Dry Run
Preview the build without actually building:
./build-docker-tar.sh --cuda-version 12.8.1 --dry-run
Example 5: Custom Build Arguments
Pass additional Docker build arguments:
./build-docker-tar.sh \
--cuda-version 12.8.1 \
--build-arg SGL_VERSION=0.5.7 \
--build-arg FLASHINFER_VERSION=0.5.4
Using the Built Images
Load from Tar File
# Load the image
docker load -i sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022.tar
# Run the container
docker run -it --rm \
--gpus all \
sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022 \
/bin/bash
Pull from DockerHub
# Pull with full tag
docker pull kvcache/ktransformers:sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022
# Or pull with simplified tag
docker pull kvcache/ktransformers:v0.4.3-cu128
# Run the container
docker run -it --rm \
--gpus all \
kvcache/ktransformers:v0.4.3-cu128 \
/bin/bash
Inside the Container
The image contains two conda environments:
# Activate serve environment (for inference with sglang)
conda activate serve
# or use the alias:
serve
# Activate fine-tune environment (for training with LLaMA-Factory)
conda activate fine-tune
# or use the alias:
finetune
Multi-CPU Variant Support
The Docker image includes all three CPU variants:
- AMX - For Intel Sapphire Rapids and newer (4th Gen Xeon+)
- AVX512 - For Intel Skylake-X, Ice Lake, Cascade Lake
- AVX2 - Maximum compatibility for older CPUs
The runtime automatically detects your CPU and loads the appropriate variant. To override:
# Force use of AVX2 variant
export KT_KERNEL_CPU_VARIANT=avx2
python your_script.py
# Enable debug output to see which variant is loaded
export KT_KERNEL_DEBUG=1
python your_script.py
Version Extraction
Versions are automatically extracted during Docker build from:
- SGLang: From
sglang.__version__in serve environment - KTransformers: From
version.pyin ktransformers repository - LLaMA-Factory: From
llamafactory.__version__in fine-tune environment
The versions are saved to /workspace/versions.env in the image:
# View versions in running container
cat /workspace/versions.env
# Output:
SGLANG_VERSION=0.5.6
KTRANSFORMERS_VERSION=0.4.3
LLAMAFACTORY_VERSION=0.9.3
Troubleshooting
Build Fails with Out of Disk Space
Check available disk space:
df -h
The build requires approximately 15-20GB of disk space. Clean up Docker:
docker system prune -a
Version Extraction Fails
If version extraction fails (shows "unknown"), check:
- The cloned repositories have the correct branches
- Python packages are properly installed in conda environments
- Version files exist in expected locations
You can manually verify by running:
docker run --rm <image> /bin/bash -c "
source /opt/miniconda3/etc/profile.d/conda.sh &&
conda activate serve &&
python -c 'import sglang; print(sglang.__version__)'
"
Push to DockerHub Fails
- Check login:
docker login - Check repository name: Must include namespace (e.g.,
kvcache/ktransformers, not justktransformers) - Network issues: Use
--max-retriesand--retry-delayoptions - Rate limiting: DockerHub has pull/push rate limits for free accounts
Advanced Topics
Custom Dockerfile Location
./build-docker-tar.sh \
--dockerfile /path/to/custom/Dockerfile \
--context-dir /path/to/build/context
Building Only Inference Image (Future)
Currently, the image always includes both serve and fine-tune environments. To create an inference-only image, modify the Dockerfile to skip the fine-tune environment section.
Customizing CPU Variants
To build only specific CPU variants, modify kt-kernel/install.sh or set environment variables in the Dockerfile.
CI/CD Integration
The scripts are designed for manual execution but can be integrated into CI/CD pipelines:
# Example GitHub Actions workflow
- name: Build and push Docker image
run: |
cd docker
./push-to-dockerhub.sh \
--cuda-version ${{ matrix.cuda_version }} \
--repository ${{ secrets.DOCKER_REPOSITORY }} \
--also-push-simplified
Support
For issues and questions:
- File an issue at: https://github.com/kvcache-ai/ktransformers/issues
- Check documentation: https://github.com/kvcache-ai/ktransformers
License
This packaging system is part of KTransformers and follows the same license.