Be able to repack tensors at run time (#147)

* Be able to repack tensors at run time * Repack: also add bf16 as repackable type * Repack: make sure number of rows is a multiple of the packing --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2026-04-29 02:41:47 +00:00 · 2024-12-17 14:16:34 +01:00
parent c16d352915
commit a648191c2c
8 changed files with 146 additions and 6 deletions
--- a/common/common.h
+++ b/common/common.h
@@ -187,6 +187,7 @@ struct gpt_params {
    bool no_kv_offload     = false; // disable KV offloading
    bool warmup            = true;  // warmup run
    bool check_tensors     = false; // validate tensor data
+    bool repack_tensors    = false; // repack tensors if interleaved variant is available

    std::string cache_type_k = "f16"; // KV cache data type for the K
    std::string cache_type_v = "f16"; // KV cache data type for the V