Llama-quantize: Partial requant feature (#1313)

* Partial Requant feature for llama-quantize - Inspired by the recently portcopied --dry-run feature. - Allows to partially requantize a split quantized .gguf by requantizing only the missing splits in the destination directory. - Works both for GGUF which are split tensors by tensors, or by group of several tensors (though this one is not very much tested beyond 2 tensors by split). - Vibe coded. * Create output directory if it doesn't exist in llama-quantize * Create output directory if it doesn't exist in gguf-split * Add exit when directory fails to be created on Windows * Use std::filesystem * cleanup
2026-03-05 11:30:09 +00:00 · 2026-02-25 07:25:15 +01:00
parent 68431b049a
commit 170467e835
5 changed files with 69 additions and 2 deletions
--- a/include/llama.h
+++ b/include/llama.h
@@ -491,6 +491,7 @@ extern "C" {
        bool ignore_imatrix_rules;           // If set to true, the built-in rules for refusing to quantize into certain quants without imatrix are ignored
        bool only_repack;                    // Only repack tensors
        bool dry_run;                        //
+        bool partial_requant;                // quantize only missing split files in the split quantized .gguf destination directory
        void * imatrix;                      // pointer to importance matrix data
        void * kv_overrides;                 // pointer to vector containing overrides
        void * custom_quants;                // pointer to vector containing custom quantization rules