ik_llama.cpp/274 - Specify tensor name regex for tensors to be repacked.md at main - ik_llama.cpp

ikawrakow/ik_llama.cpp

Fork 0

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 09:09:50 +00:00

Files

Thomas eaa2510a28 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

934 B

Raw Permalink Blame History

🔀 #274 - Specify tensor name regex for tensors to be repacked

Author	`ikawrakow`
State	❌ Closed
Created	2025-03-21
Updated	2025-03-21

Description

This PR follows in the footsteps of #272 and adds the ability to specify one or more regular expressions to use for matching tensor names to be repacked. This is useful for hybrid GPU/CPU inference where one will want to repack only the tensors that stay on the CPU.

Usage

./bin/llama-quantize --repack --repack-pattern regex1,regex2,... some_model output_file_name quant_type

E.g., if one uses tensor override -ot exps=CPU for inference to have the DeepSeek MoE experts stay on the CPU, one would use

./bin/llama-quantize --repack --repack-pattern exps some_model output_file_name quant_type

to repack an existing model.

934 B Raw Permalink Blame History

🔀 #274 - Specify tensor name regex for tensors to be repacked

Description

934 B

Raw Permalink Blame History