Be able to repack tensors at run time (#147)

* Be able to repack tensors at run time

* Repack: also add bf16 as repackable type

* Repack: make sure number of rows is a multiple of the packing

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
Kawrakow
2024-12-17 14:16:34 +01:00
committed by GitHub
parent c16d352915
commit a648191c2c
8 changed files with 146 additions and 6 deletions

View File

@@ -187,6 +187,7 @@ struct gpt_params {
bool no_kv_offload = false; // disable KV offloading
bool warmup = true; // warmup run
bool check_tensors = false; // validate tensor data
bool repack_tensors = false; // repack tensors if interleaved variant is available
std::string cache_type_k = "f16"; // KV cache data type for the K
std::string cache_type_v = "f16"; // KV cache data type for the V