OAI: Add response_prefix and fix BOS token issues in chat completions

response_prefix is used to add a prefix before generating the next
message. This is used in many cases such as continuining a prompt
(see #96).

Also if a template has BOS token specified, add_bos_token will
append two BOS tokens. Add a check which strips a starting BOS token
from the prompt if it exists.

Signed-off-by: kingbri <bdashore3@proton.me>
This commit is contained in:
kingbri
2024-04-25 00:54:43 -04:00
parent ed7cd3cb59
commit fb1d2f34c1
4 changed files with 20 additions and 1 deletions

View File

@@ -878,6 +878,7 @@ class ExllamaV2Container:
encode_special_tokens=True,
return_offsets=True,
)
print(ids)
mask = (
self.tokenizer.padding_mask(ids)
if self.use_cfg and gen_settings.cfg_scale not in [None, 1.0]