ik_llama.cpp/228 - Feature Request_ create tool to offline repack models.md at eaa2510a28b60d43c2210c69cefdf750d5cc119f - ik_llama.cpp

ikawrakow/ik_llama.cpp

Fork 0

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 09:09:50 +00:00

Files

Thomas eaa2510a28 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

1.4 KiB

Raw Blame History

✨ #228 - Feature Request: create tool to offline repack models

Author	`ikawrakow`
State	❌ Closed
Created	2025-02-23
Updated	2025-03-21

Description

Prerequisites

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Add a tool to repack an existing quantized model to _R4/_R8 quants and store the result on disk for later use.

Motivation

Run time repacking increases performance, but can significantly prolong model loading for very large models such as DeepSeekV3/R1. One can of course re-quantize the model to _R4/_R8 quants, but the original f16/bf16 model may not be available (because, e.g., it is extremely large and the user did not download). Hence, it would be useful to have a tool to repack an existing quantized model to _R4/_R8 quants and store the resulting model on disk.

Possible Implementation

No response

1.4 KiB Raw Blame History