Commit Graph

11 Commits

Author SHA1 Message Date
Nexes the Elder
d4ac5f1566 gguf-split: fix the split output files naming (#1336)
* Fix gguf-split.cpp splits output naming

With this fix, the initial extension of the source .gguf file is not included in the naming of the output file before the numeration of the splits.

ex:

No more model.gguf-00001-of-00200.gguf
Instead, model-00001-of-00200.gguf

* increase ggml_max_context to 2048

* Revert GGML_MAX_CONTEXTS to 64
2026-03-02 08:43:47 +01:00
Nexes the Elder
170467e835 Llama-quantize: Partial requant feature (#1313)
* Partial Requant feature for llama-quantize

- Inspired by the recently portcopied --dry-run feature.
- Allows to partially requantize a split quantized .gguf by requantizing only the missing splits in the destination directory.
- Works both for GGUF which are split tensors by tensors, or by group of several tensors (though this one is not very much tested beyond 2 tensors by split).
- Vibe coded.

* Create output directory if it doesn't exist in llama-quantize

* Create output directory if it doesn't exist in gguf-split

* Add exit when directory fails to be created on Windows

* Use std::filesystem

* cleanup
2026-02-25 07:25:15 +01:00
Nexes the Elder
3c4f887b10 gguf-split : update (#444)
gguf-split : improve --split and --merge logic (#9619)

* make sure params --split and --merge are not specified at same time

* update gguf-split params parse logic

* Update examples/gguf-split/gguf-split.cpp

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>

---------

gguf-split : add basic checks (#9499)

* gguf-split : do not overwrite existing files when merging

* gguf-split : error when too many arguments are passed

Authored-by: slaren <slarengh@gmail.com>
2025-05-23 08:07:42 +03:00
Christian Zhou-Zheng
c0600f4f5c gguf-split : change binary multi-byte units to decimal (#7803) 2024-06-07 15:56:01 +03:00
Xuan Son Nguyen
a93233ac8c gguf-split: add --no-tensor-first-split (#7072) 2024-05-04 18:56:22 +02:00
Sigbjørn Skjæret
10579b6e66 Fix --split-max-size (#6655)
* Fix --split-max-size

Byte size calculation was done on int and overflowed.

* add tests.sh

* add examples test scripts to ci run

Will autodiscover examples/*/tests.sh scripts and run them.

* move WORK_PATH to a subdirectory

* clean up before and after test

* explicitly define which scripts to run

* add --split-max-size to readme
2024-04-14 13:12:59 +02:00
Xuan Son Nguyen
75b580db0a split: allow --split-max-size option (#6343)
* split by max size

* clean up arg parse

* split: ok

* add dry run option

* error on 0 tensors

* be positive

* remove next_metadata_size
2024-03-29 22:34:44 +01:00
Pierrick Hymbert
c9d1476f75 common: llama_load_model_from_url split support (#6192)
* llama: llama_split_prefix fix strncpy does not include string termination
common: llama_load_model_from_url:
 - fix header name case sensitive
 - support downloading additional split in parallel
 - hide password in url

* common: EOL EOF

* common: remove redundant LLAMA_CURL_MAX_PATH_LENGTH definition

* common: change max url max length

* common: minor comment

* server: support HF URL options

* llama: llama_model_loader fix log

* common: use a constant for max url length

* common: clean up curl if file cannot be loaded in gguf

* server: tests: add split tests, and HF options params

* common: move llama_download_hide_password_in_url inside llama_download_file as a lambda

* server: tests: enable back Release test on PR

* spacing

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* spacing

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* spacing

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-23 18:07:00 +01:00
Pierrick Hymbert
9127e8276f llama_model_loader: support multiple split/shard GGUFs (#6187)
* split: support in llama_model_loader

* avoid copying the entire vector

Co-authored-by: slaren <slarengh@gmail.com>

* split: move llama_tensor_offset to llama_model_loader

* llama_model_loader: PR feedbacks:
 - use only one gguf_context for metadata only
 - store all ggml_context in a vector as the files and mappings
 - store all weights in a vector along with the source tensor
 - rename ctx_gguf to meta
 - rename ctx_meta to contexts

* avoid copying the entire vector

* Simplify this by making these optional, switch some layer creation tensor optional

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Handle optional tensors

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* llama_model_loader: fail if backend cannot allocate buffer

* fix mmap buffer management

* llama_model_loader: map file to backend buffer if the allocation succeeds only

* llama_model_loader: only map tensors included in the context

* llama_model_loader: minor, use same variable name for consistency, fix spacing in types cast

* llama_model_loader: fail if any of backend buffer cannot be allocated

* spacing

Co-authored-by: slaren <slarengh@gmail.com>

* fix loop over pointer

Co-authored-by: slaren <slarengh@gmail.com>

* llama_model_loader: if n_tensors declared not equals to loaded tensors in split, throw an exception instead of asserting

* llama_model_loader: ensure mappings vector has the expected size

* llama_model_loader:  use at instead of operator[] if this should never add to the map.

* llama_model_loader: immediately add the backend buffer to the model buffers in order to free them if an error occurs in the next allocation. Reserve the expected size.

* llama_model_loader: be sure the model mappings has enough capacity before allocating backend buffer

* llama_model_loader: fix map -> unordered map

* llama_split_prefix: use a clearer version, not pass split path len but dest max len.

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* llama : minor

ggml-ci

* llama : introduce some typedef helpers

* docs: add model shard in hot topic

* llama_model_loader: put mapping in a unique_ptr from the moment it is allocated

Co-authored-by: slaren <slarengh@gmail.com>

* fix llama_split_prefix

---------

Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-03-22 19:00:01 +01:00
DAN™
2dd32d61b7 Remove undeed header file. (#6158) 2024-03-19 17:16:09 +01:00
Pierrick Hymbert
1b5523dc79 gguf-split: split and merge gguf per batch of tensors (#6135)
* gguf-split: split and merge gguf files per tensor

* gguf-split: build with make toolchain

* gguf-split: rename `--split-tensors-size` to `--split-max-tensors`. Set general.split_count KV to all split

* split : minor style + fix compile warnings

* gguf-split: remove --upload not implemented

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-19 12:05:44 +01:00