Do not allocate KV cache for unused layers (#843)

* Do not allocate KV cache for unused layers

* Do not apply experts weight scale if it is 1

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
Kawrakow
2025-10-20 10:09:39 +03:00
committed by GitHub
parent 1789de5994
commit 22540cee60
2 changed files with 2 additions and 2 deletions

View File

@@ -532,7 +532,7 @@ static bool llama_kv_cache_init(
const struct llama_hparams & hparams = model.hparams;
const int64_t n_layer = hparams.n_layer;
const int64_t n_layer = hparams.n_layer - hparams.nextn_predict_layers;
cache.has_shift = false;