Llama-quantize: Partial requant feature (#1313)

* Partial Requant feature for llama-quantize - Inspired by the recently portcopied --dry-run feature. - Allows to partially requantize a split quantized .gguf by requantizing only the missing splits in the destination directory. - Works both for GGUF which are split tensors by tensors, or by group of several tensors (though this one is not very much tested beyond 2 tensors by split). - Vibe coded. * Create output directory if it doesn't exist in llama-quantize * Create output directory if it doesn't exist in gguf-split * Add exit when directory fails to be created on Windows * Use std::filesystem * cleanup
2026-02-28 17:14:17 +00:00 · 2026-02-25 07:25:15 +01:00
parent 68431b049a
commit 170467e835
5 changed files with 69 additions and 2 deletions
--- a/src/llama.cpp
+++ b/src/llama.cpp
@@ -4414,6 +4414,7 @@ struct llama_model_quantize_params llama_model_quantize_default_params() {
        /*.ignore_imatrix_rules        =*/ false,
        /*.only_repack                 =*/ false,
        /*.dry_run                     =*/ false,
+        /*.partial_requant             =*/ false,
        /*.imatrix                     =*/ nullptr,
        /*.kv_overrides                =*/ nullptr,
        /*.custom_quants               =*/ nullptr,