Using the Outlines library, add support to supply EBNF strings and
pass them to the library for parsing.
From there, a wrapper is created and a filter is passed to generation.
Replace with an in-house solution at some point that's more flexible.
Signed-off-by: kingbri <bdashore3@proton.me>
Add the ability to constrain the return value of a model to be JSON.
Built using the JSON schema standard to define the properties of what
the model should return.
This feature should be more accurate than using GBNF/EBNF to yield
the same results due to the use of lmformatenforcer.
GBNF/EBNF will be added in a different commit/branch.
Signed-off-by: kingbri <bdashore3@proton.me>
This option saves some VRAM, but does have the chance to error out.
Add this in the experimental config section.
Signed-off-by: kingbri <bdashore3@proton.me>
Injecting into Pydantic fields caused issues with serialization for
documentation rendering. Rather than reinvent the wheel again,
switch to a chain of if statements for now. This may change in the
future if subclasses from the base sampler request need to be
validated as well.
Signed-off-by: kingbri <bdashore3@proton.me>
Rather than maintaining yet another function to validate sampler
ranges/values, embed them in fields which allows for less
maintainence in the future.
Also add validation for existing samplers that can corrupt
the sampling stack if set improperly.
Signed-off-by: kingbri <bdashore3@proton.me>
Returns token offsets, selected tokens, probabilities of tokens
post-sampling, and normalized probability of selecting a token
pre-sampling (for efficiency purposes).
Only for text completions. Chat completions in a later commit.
Signed-off-by: kingbri <bdashore3@proton.me>
Many APIs automatically ask for request streaming without giving
the user the option to turn it off. Therefore, give the user more
freedom by giving a server-side kill switch.
Signed-off-by: kingbri <bdashore3@proton.me>
Add the ability to use an unsafe config flag if needed and migrate
the exl2 check to a different file within the exl2 backend code.
Signed-off-by: kingbri <bdashore3@proton.me>
Cleanup how overrides are handled, class naming, and adopt exllamav2's
model class to enforce latest stable version methods rather than
adding multiple backwards compatability checks.
Signed-off-by: kingbri <bdashore3@proton.me>
Does not work if max_temp is less than or equal to min_temp. Sampler
validation will have to be refactored in the future, so the dynamic
temperature check will also be changed.
Signed-off-by: kingbri <bdashore3@proton.me>
The example JSON fields were changed because of the new sampler
default strategy. Fix these by manually changing the values.
Also add support for fasttensors and expose generate_window to
the API. It's recommended to not adjust generate_window as it's
dynamically scaled based on max_seq_len by default.
Signed-off-by: kingbri <bdashore3@proton.me>
Allow users to switch the currently overriden samplers via the API
so a restart isn't required to switch the overrides.
Signed-off-by: kingbri <bdashore3@proton.me>
Unify API sampler params into a superclass which should make them
easier to manage and inherit generic functions from.
Not all frontends expose all sampling parameters due to connections
with OAI (that handles sampling themselves with the exception of
a few sliders).
Add the ability for the user to customize fallback parameters from
server-side.
In addition, parameters can be forced to a certain value server-side
in case the repo automatically sets other sampler values in the
background that the user doesn't want.
Signed-off-by: kingbri <bdashore3@proton.me>
Move common functions into their own folder and refactor the backends
to use their own folder as well.
Also cleanup imports and alphabetize import statments themselves.
Finally, move colab and docker into their own folders as well.
Signed-off-by: kingbri <bdashore3@proton.me>