API: Re-add BOS token stripping in template render

Matching YALS, if the model has add_bos_token enabled, then remove an extra BOS token at the start of the prompt. This usually happens with misconfigured templates such as Llama 3. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2026-04-26 01:08:52 +00:00 · 2025-05-24 21:11:53 -04:00
parent 10fbe043a4
commit 2d89c96879
1 changed files with 10 additions and 0 deletions
--- a/endpoints/OAI/utils/chat_completion.py
+++ b/endpoints/OAI/utils/chat_completion.py
@@ -286,6 +286,16 @@ async def apply_chat_template(
                    "add_generation_prompt is False"
                )
        # Removes the starting BOS token if the model adds one
        # This is to prevent add_bos_token from adding multiple bos tokens
        bos_token = template_vars.get("bos_token")
        if (
            bos_token
            and model.container.hf_model.add_bos_token()
            and prompt.startswith(bos_token)
        ):
            prompt = prompt.removeprefix(bos_token)
        # Add template metadata
        await _append_template_metadata(data, template_vars)