API: Fix num_experts_per_token reporting

This wasn't linked to the model config. This value can be 1 if
a MoE model isn't loaded.

Signed-off-by: kingbri <bdashore3@proton.me>
This commit is contained in:
kingbri
2023-12-28 00:31:14 -05:00
parent c5bbfd97b2
commit 3622710582
2 changed files with 2 additions and 1 deletions

View File

@@ -111,6 +111,7 @@ async def get_current_model():
max_seq_len=MODEL_CONTAINER.config.max_seq_len,
cache_mode="FP8" if MODEL_CONTAINER.cache_fp8 else "FP16",
prompt_template=prompt_template.name if prompt_template else None,
num_experts_per_token=MODEL_CONTAINER.config.num_experts_per_token,
),
logging=gen_logging.PREFERENCES,
)