Load all MoE experts during warmup and make warmup 1 token (#198)

* Load all MoE experts during warmup

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

* Unify warmup to one token

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
This commit is contained in:
saood06
2025-02-10 09:40:38 -06:00
committed by GitHub
parent c12f73ba61
commit a366a3d17d
3 changed files with 17 additions and 10 deletions

View File

@@ -2169,8 +2169,10 @@ struct llama_init_result llama_init_from_gpt_params(gpt_params & params) {
if (bos != -1) {
tmp.push_back(bos);
}
tmp.push_back(eos);
else
{
tmp.push_back(eos);
}
if (llama_model_has_encoder(model)) {
llama_encode(lctx, llama_batch_get_one(tmp.data(), tmp.size(), 0, 0));
llama_token decoder_start_token_id = llama_model_decoder_start_token(model);