A chat completion can now declare extra template_vars to pass when
a template is rendered, opening up the possibility of using state
outside of huggingface's parameters.
Signed-off-by: kingbri <bdashore3@proton.me>
response_format allows a user to request a valid, but arbitrary JSON
object from the API. This is a new part of the OAI spec.
Signed-off-by: kingbri <bdashore3@proton.me>
Wrong class attribute name used for max_attention_size and fixes
declaration of the draft model's chunk_size.
Also expose the parameter to the end user in both config and model
load.
Signed-off-by: kingbri <bdashore3@proton.me>
OAI expects finish_reason to be "stop" or "length" (there are others,
but they're not in the current scope of this project).
Make all completions and chat completions responses return this
from the model generation itself rather than putting a placeholder.
Signed-off-by: kingbri <bdashore3@proton.me>
This is a definite way to check if an authorized key is API or admin.
The endpoint only runs if the key is valid in the first place to keep
inline with the API's security model.
Signed-off-by: kingbri <bdashore3@proton.me>
Moving the API into its own directory helps compartmentalize it
and allows for cleaning up the main file to just contain bootstrapping
and the entry point.
Signed-off-by: kingbri <bdashore3@proton.me>