mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-05-13 09:26:00 +00:00

Files

Yadir Hernandez Batista db31e7d803 Added workflow to build container images (#1279 )

* ci: implement build matrix for CUDA/CPU containers with dynamic tagging

* fix: Updated Docker images/build-container.yml

* fix: Updated the documentation about Docker

* fix: Set Arch for 3090s

* fix: Updated build step name.

* fix: Set target ARCH as a variable

* feat: Added cleanup step

* feat: Added docker-bake and updated workflow

* fix: Issue with REPO_OWNER variable

* fix: Updated workflow to solve errors

* fix: Updated branch format

* fix: Wrong naming

* Update docker-bake.hcl

* Update build-container.yml

* Update ik_llama-cuda.Containerfile

* Update ik_llama-cpu.Containerfile

* Update docker-bake.hcl

* Update build-container.yml

* Removed action/cache

* added -sSL for reliability and fixed the URL path

* added -sSL for reliability and fixed the URL path CUDA containerfile

* fix: correct Dockerfile RUN command syntax errors

- Combine split apt-get install commands in both Containerfiles
- Fix broken cmake command continuation in ik_llama-cuda.Containerfile

* fix: correct llama-swap download URL in Containerfiles

- Fix broken line continuation in curl download URL for llama-swap

* perf: improve ccache configuration in Containerfiles

- Add CCACHE_UMASK=000 for cache accessibility across stages
- Add CCACHE_MAXSIZE=1G to prevent unbounded growth
- Initialize ccache with ccache -i during build stage

* fix: remove problematic ccache initialization from Containerfiles

- ccache -i fails because CCACHE_DIR mount doesn't exist yet at build time

* fix: add git to CPU Containerfile build dependencies

- Resolves CMake warning about missing Git for build info

* chore: optimize Containerfile with smaller images and better healthchecks

- Add --no-install-recommends to all apt-get commands for smaller image size
- Add ca-certificates to base stage for HTTPS support
- Merge redundant build copy commands from 3 layers to 1
- Fix llama-swap version from 198 to v199 (latest release)
- Add HEALTHCHECK configuration with interval/timeout/retries to server and swap stages
- Copy /app/lib in server stage to fix container startup

* chore: fix CUDA Containerfile healthchecks and swap version

- Add /app/lib copy in server stage to fix container startup
- Fix llama-swap version from 198 to v199 (latest release)
- Add HEALTHCHECK configuration with interval/timeout/retries

* chore: fix indentation in Containerfiles and add LD_LIBRARY_PATH for server target

* fix: add --break-system-packages flag for pip in CPU Containerfile

* feat: add git bind mount for build info and NCCL support for CUDA

* fix: remove libnccl-dev from CUDA build (already included in base image)

* fix: added Markdown files to ignore files

* feat: use BUILD_NUMBER-COMMIT pattern for docker image tags

- Add BUILD_NUMBER and LLAMA_COMMIT to build workflow
- Update docker-bake.hcl to use version tag format matching llama-server --version output
- Format: VARIANT-BUILD_NUMBER-COMMIT (e.g., cu12-full-4406-3bc90dfd)

* fix: fetch full git history for accurate BUILD_NUMBER

- Add fetch-depth: 0 to actions/checkout to get all commits
- This ensures git rev-list --count HEAD returns correct total commit count

* fix: fetch full git history in Dockerfile for accurate BUILD_NUMBER

- Add git fetch --unshallow to get complete commit history during build
- This ensures build-info.cpp is generated with correct LLAMA_BUILD_NUMBER

* chore: update GitHub Actions to latest versions for Node.js 24 compatibility

- docker/setup-buildx-action@v3 -> v4
- docker/login-action@v3 -> v4

* chore: update all GitHub Actions to Node.js 24 compatible versions

- actions/checkout@v4 -> v6
- docker/setup-buildx-action@v3 -> v5
- docker/login-action@v3 -> v6
- docker/bake-action@v5 -> v7

* fix: use CI-passed BUILD_NUMBER and LLAMA_COMMIT in Dockerfile

- Add BUILD_NUMBER and LLAMA_COMMIT as build args
- Fall back to git commands if not provided
- Pass values explicitly to cmake for accurate build info

* fix: pass BUILD_NUMBER and LLAMA_COMMIT as Docker build args

- Add BUILD_NUMBER and LLAMA_COMMIT to docker bake args
- These will be used by the Containerfile for accurate build info

* fix: revert docker actions to v4 (latest available versions)

* fix: calculate BUILD_NUMBER and LLAMA_COMMIT directly in Containerfile

- Removed ARG defaults since we calculate from git during build
- Use git rev-list --count HEAD and git rev-parse for accurate version info
- Falls back to 0/unknown if git commands fail

* feat: calculate BUILD_NUMBER and LLAMA_COMMIT in Containerfiles

- Add git-based version calculation in both CPU and CUDA Containerfiles
- Remove .git bind mount (git is copied with COPY .)
- Pass build info to CMake for accurate llama-server --version output

* feat: calculate BUILD_NUMBER and LLAMA_COMMIT in Containerfiles

- Add git-based version calculation using git rev-list and git rev-parse
- Copy .git directory separately to ensure git commands work during build
- Pass build info to CMake for accurate llama-server --version output

* fix: cache improvements for CUDA and CPU builds

* fix: "/.git": not found

* fix: Unnecessary mv llama-swap

* fix: Remove BUILD_NUMBER and LLAMA_COMMIT from docker file, calculated by cmake proc

* fix: remove .git from dockerignore for local and CI builds

- Enables cmake to access .git directory during Docker build
- Required for version calculation in llama-server binary
- GitHub Actions uses explicit mount via bake action set parameter

* fix: Remove mounts key from Build and Push step in gh workflow

* ci: add .git verification step before build

* refactor: standardize Containerfile structure and remove .git mount dependency

- Remove --mount=type=bind,source=.git,target=.git from both Containerfiles
- Replace COPY . . with git clone for cleaner build context
- Add CUSTOM_COMMIT ARG for optional custom commit switching
- Standardize ARG/ENV ordering and comment formatting across CPU/CUDA variants
- Install ca-certificates before git clone to fix SSL verification issues
- Rename 'Structured artifact collection' to 'Collect build artifacts'

* ci: remove broken cache pruning step

* ci: remove broken prune-cache job

- Remove prune-cache job that was failing due to missing .git directory
- The job required a checkout step and the cache pruning logic was non-critical

* chore: Removed step for Verifying .git existance in GH workflow

* fix: ensure build always proceeds even if git switch fails

- Add '|| true' to git switch command so build continues on failure
- This prevents the entire RUN step from failing when CUSTOM_COMMIT is invalid

* fix: resolve Docker build pipeline issues

- Remove external git clone from Containerfiles, use build context directly
- Add BUILD_NUMBER and BUILD_COMMIT as CMake cache variables in build-info.cmake
- Fix .devops/tools.sh inclusion by using explicit COPY for hidden directories
- Set USE_CCACHE=true for CI builds
- Clean up unused SHA_SHORT variable from docker-bake.hcl

Fixes: Build steps were cached incorrectly due to external git clone ignoring the actual build context source.

* fix: include .git in Docker build context and add verification

* ci: add .git directory verification step after checkout

* build: fix .git mount path for Docker build context compatibility

* build: fix .git mount path for Docker build context compatibility

* docker: include .git in build context for version calculation

* ci: add .git directory verification step after checkout

* chore: Removed unecessary Verify .git step (It was a test)

* docs: update README with docker-bake and build-local.sh instructions

* docs: remove build-local.sh reference (not in repo)

* ci: optimize disk usage by limiting fetch depth and cleaning workspace

---------

Co-authored-by: HP Prodesk <sourceupdev@gmail.com>

2026-04-10 08:06:47 +02:00

6.0 KiB

Raw Blame History

Build and use ik_llama.cpp with CPU or CPU+CUDA

Built on top of ikawrakow/ik_llama.cpp and llama-swap

All commands are provided for Podman and Docker.

CPU or CUDA sections under Build and Run are enough to get up and running.

Build

Using docker-bake (Recommended)

The project uses Docker Bake for building multiple targets efficiently.

CPU Variant

docker buildx bake --builder ik-llama-builder full swap

Or with custom tags:

REPO_OWNER=yourname docker buildx bake --builder ik-llama-builder \
  -f ./docker-bake.hcl \
  full swap

CUDA Variant

First, set the CUDA version and GPU architecture in ik_llama-cuda.Containerfile:

CUDA_DOCKER_ARCH: Your GPU's compute capability (e.g., 86 for RTX 30*, 89 for RTX 40*, 12.0 for RTX 50*)
CUDA_VERSION: CUDA Toolkit version (e.g., 12.6.2, 13.1.1)

VARIANT=cu12 docker buildx bake --builder ik-llama-builder full swap

Build Targets

Builds two image tags per variant:

full: Includes llama-server, llama-quantize, and other utilities.
swap: Includes only llama-swap and llama-server.

Local Development

Clone the repository: git clone https://github.com/ikawrakow/ik_llama.cpp
Enter the repo: cd ik_llama.cpp
Use either docker-bake or build-local.sh as shown above.

Run

Download .gguf model files to your favorite directory (e.g., /my_local_files/gguf).
Map it to /models inside the container.
Open browser http://localhost:9292 and enjoy the features.
API endpoints are available at http://localhost:9292/v1 for use in other applications.

CPU

podman run -it --name ik_llama --rm -p 9292:8080 -v /my_local_files/gguf:/models:ro localhost/ik_llama-cpu:swap

docker run -it --name ik_llama --rm -p 9292:8080 -v /my_local_files/gguf:/models:ro localhost/ik_llama-cpu:swap

CUDA

Install Nvidia Drivers and CUDA on the host.
For Docker, install NVIDIA Container Toolkit
For Podman, install CDI Container Device Interface
Identify your GPU:
- CUDA GPU Compute Capability (e.g., 8.6 for RTX30*, 8.9 for RTX40*, 12.0 for RTX50*)
- CUDA Toolkit supported version

podman run -it --name ik_llama --rm -p 9292:8080 -v /my_local_files/gguf:/models:ro --device nvidia.com/gpu=all --security-opt=label=disable localhost/ik_llama-cuda:swap

docker run -it --name ik_llama --rm -p 9292:8080 -v /my_local_files/gguf:/models:ro --runtime nvidia localhost/ik_llama-cuda:swap

Troubleshooting

If CUDA is not available, use ik_llama-cpu instead.
If models are not found, ensure you mount the correct directory: -v /my_local_files/gguf:/models:ro
If you need to install podman or docker follow the Podman Installation or Install Docker Engine for your OS.

Extra

Custom commit: Build a specific ik_llama.cpp commit by modifying the Containerfile or using build args.

docker buildx bake --builder ik-llama-builder --set full.args.BUILD_COMMIT=1ec12b8 full

Using the tools in the full image:

$ podman run -it --name ik_llama_full --rm -v /my_local_files/gguf:/models:ro --entrypoint bash localhost/ik_llama-cpu:full
# ./llama-quantize ...
# python3 gguf-py/scripts/gguf_dump.py ...
# ./llama-perplexity ...
# ./llama-sweep-bench ...

docker run -it --name ik_llama_full --rm -v /my_local_files/gguf:/models:ro --runtime nvidia --entrypoint bash localhost/ik_llama-cuda:full
# ./llama-quantize ...
# python3 gguf-py/scripts/gguf_dump.py ...
# ./llama-perplexity ...
# ./llama-sweep-bench ...

Customize llama-swap config: Save the ./docker/ik_llama-cpu-swap.config.yaml or ./docker/ik_llama-cuda-swap.config.yaml locally (e.g., under /my_local_files/) then map it to /app/config.yaml inside the container appending -v /my_local_files/ik_llama-cpu-swap.config.yaml:/app/config.yaml:ro to your podman run ... or docker run ....
Run in background: Replace -it with -d: podman run -d ... or docker run -d .... To stop it: podman stop ik_llama or docker stop ik_llama.
GGML_NATIVE: If you build the image on a different machine, change -DGGML_NATIVE=ON to -DGGML_NATIVE=OFF in the .Containerfile.
KV quantization types: To use more KV quantization types, build with -DGGML_IQK_FA_ALL_QUANTS=ON.

Cleanup unused CUDA images: If you experiment with several CUDA_VERSION, delete unused images (they are several GB):

podman image rm docker.io/nvidia/cuda:12.4.0-runtime-ubuntu22.04 && \
  podman image rm docker.io/nvidia/cuda:12.4.0-devel-ubuntu22.04

Build without llama-swap: Change --target swap to --target server in docker-bake or Containerfiles.
Pre-made quants: Look for premade quants from ubergarm.
GGUF tools: Build custom quants with Thireus's tools.
Download prebuilt binaries: Download from ik_llama.cpp's Thireus fork with release builds for macOS/Windows/Ubuntu CPU and Windows CUDA.
KoboldCPP experience: Croco.Cpp is a fork of KoboldCPP inferring GGUF/GGML models on CPU/Cuda with KoboldAI's UI. It's powered partly by IK_LLama.cpp, and compatible with most of Ikawrakow's quants except Bitnet.

Credits

All credits to the awesome community:

llama-swap

6.0 KiB Raw Blame History