OAI: Add cache_mode parameter to model

Mistakenly forgot that the user can choose what cache mode to use
when loading a model.

Also add when fetching model info.

Signed-off-by: kingbri <bdashore3@proton.me>
This commit is contained in:
kingbri
2023-12-16 02:42:36 -05:00
parent ed868fd262
commit 1a331afe3a
2 changed files with 3 additions and 0 deletions

View File

@@ -82,6 +82,7 @@ async def get_current_model():
rope_scale = model_container.config.scale_pos_emb,
rope_alpha = model_container.config.scale_alpha_value,
max_seq_len = model_container.config.max_seq_len,
cache_mode = "FP8" if model_container.cache_fp8 else "FP16",
prompt_template = unwrap(model_container.prompt_template, "auto")
),
logging = gen_logging.config