Pytorch released 2.2 without letting the community know first. Pin
the torch version to 2.1.2 until exllamav2 builds for torch 2.2
Signed-off-by: kingbri <bdashore3@proton.me>
The example JSON fields were changed because of the new sampler
default strategy. Fix these by manually changing the values.
Also add support for fasttensors and expose generate_window to
the API. It's recommended to not adjust generate_window as it's
dynamically scaled based on max_seq_len by default.
Signed-off-by: kingbri <bdashore3@proton.me>
The previous commit iterated through multiple try conditions which
made it so the user has to provide a dummy prompt template. Now,
template loading is fallback based.
Run through a loop of functions and return if one of them succeeds.
Signed-off-by: kingbri <bdashore3@proton.me>
Allows for adjustment of reservation space at the end of the context
before rolling it. This should be scaled as a model's max_seq_len
goes up.
Signed-off-by: kingbri <bdashore3@proton.me>
Allow users to switch the currently overriden samplers via the API
so a restart isn't required to switch the overrides.
Signed-off-by: kingbri <bdashore3@proton.me>
Unify API sampler params into a superclass which should make them
easier to manage and inherit generic functions from.
Not all frontends expose all sampling parameters due to connections
with OAI (that handles sampling themselves with the exception of
a few sliders).
Add the ability for the user to customize fallback parameters from
server-side.
In addition, parameters can be forced to a certain value server-side
in case the repo automatically sets other sampler values in the
background that the user doesn't want.
Signed-off-by: kingbri <bdashore3@proton.me>
Move common functions into their own folder and refactor the backends
to use their own folder as well.
Also cleanup imports and alphabetize import statments themselves.
Finally, move colab and docker into their own folders as well.
Signed-off-by: kingbri <bdashore3@proton.me>
Helps with understanding API aliases. These aliases should not be
used but are helpful for developers who want frontend compat.
Signed-off-by: kingbri <bdashore3@proton.me>
Previously, if model_name was commented out, a load would not occur.
Add the case if model_name or loras is blank which returns None when
parsing the YAML.
Signed-off-by: kingbri <bdashore3@proton.me>
Fallback to the BOS token since an empty string won't do anything.
Ideally, an empty negative prompt should not be used, but it's not
the end of the world.
Signed-off-by: kingbri <bdashore3@proton.me>
CFG, or classifier-free guidance helps push a model in different
directions based on what the user provides.
Currently, CFG is ignored if the negative prompt is blank (it shouldn't
be used in that way anyways).
Signed-off-by: kingbri <bdashore3@proton.me>
Add an argparser that casts over to dictionaries of subgroups to
integrate with the config.
This argparser doesn't contain everything in the config due to complexity
issues with CLI args, but will eventually progress to parity. In addition,
it's used to override the config.yml rather than replace it.
A config arg is also provided if the user wants to fully override the
config yaml with another file path.
Signed-off-by: kingbri <bdashore3@proton.me>
The appropriate branches weren't firing when frequency penalty is
0.0. Also fix repetition penalty overriding.
Signed-off-by: kingbri <bdashore3@proton.me>
Previous behavior aliased freq pen for rep pen. Keep this behavior
when using the freq pen parameter with a legacy exllamav2 version
rather than ignoring both entirely.
Signed-off-by: kingbri <bdashore3@proton.me>
With the new wiki, all parameters are fully documented along with
comments in the YAML file itself. This should help new users who
pull, copy the config, and can't start the API due to subsections
being uncommented and read.
Signed-off-by: kingbri <bdashore3@proton.me>
In newer versions of exllamav2, this value is read from the model's
config.json. This value will still default to 1.0 anyways.
Signed-off-by: kingbri <bdashore3@proton.me>
All penalties can have a sustain (range) applied to them in exl2,
so clarify the parameter.
However, the default behaviors change based on if freq OR pres pen
is enabled. For the sanity of OAI users, have freq and pres pen only
apply on the output tokens when range is -1 (default).
But, repetition penalty still functions the same way where -1 means
the range is the max seq len.
Doing this prevents gibberish output when using the more modern freq
and presence penalties similar to llamacpp.
NOTE: This logic is still subject to change in the future, but I believe
it hits the happy medium for users who want defaults and users who want
to tinker around with the sampling knobs.
Signed-off-by: kingbri <bdashore3@proton.me>
Direct python can be used for requirements checking. Remove the ps1
script and create a venv purely in batch.
Signed-off-by: kingbri <bdashore3@proton.me>