Llama-quantize: Partial requant feature (#1313)

* Partial Requant feature for llama-quantize

- Inspired by the recently portcopied --dry-run feature.
- Allows to partially requantize a split quantized .gguf by requantizing only the missing splits in the destination directory.
- Works both for GGUF which are split tensors by tensors, or by group of several tensors (though this one is not very much tested beyond 2 tensors by split).
- Vibe coded.

* Create output directory if it doesn't exist in llama-quantize

* Create output directory if it doesn't exist in gguf-split

* Add exit when directory fails to be created on Windows

* Use std::filesystem

* cleanup
This commit is contained in:
Nexes the Elder
2026-02-25 07:25:15 +01:00
committed by GitHub
parent 68431b049a
commit 170467e835
5 changed files with 69 additions and 2 deletions

View File

@@ -491,6 +491,7 @@ extern "C" {
bool ignore_imatrix_rules; // If set to true, the built-in rules for refusing to quantize into certain quants without imatrix are ignored
bool only_repack; // Only repack tensors
bool dry_run; //
bool partial_requant; // quantize only missing split files in the split quantized .gguf destination directory
void * imatrix; // pointer to importance matrix data
void * kv_overrides; // pointer to vector containing overrides
void * custom_quants; // pointer to vector containing custom quantization rules