Files
ik_llama.cpp/github-data/issues/228 - Feature Request_ create tool to offline repack models.md
2025-07-23 13:31:53 +02:00

1.4 KiB

#228 - Feature Request: create tool to offline repack models

Author ikawrakow
State Closed
Created 2025-02-23
Updated 2025-03-21

Description

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Add a tool to repack an existing quantized model to _R4/_R8 quants and store the result on disk for later use.

Motivation

Run time repacking increases performance, but can significantly prolong model loading for very large models such as DeepSeekV3/R1. One can of course re-quantize the model to _R4/_R8 quants, but the original f16/bf16 model may not be available (because, e.g., it is extremely large and the user did not download). Hence, it would be useful to have a tool to repack an existing quantized model to _R4/_R8 quants and store the resulting model on disk.

Possible Implementation

No response