Uvloop/Winloop does provide advantages to asyncio vs the standard
Proactor loop, so remove experimental status.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
Torch - 2.6.0
ExllamaV2 - 0.2.8
Flash-attn - 2.7.4.post1
Cuda wheels are now 12.4 instead of 12.1, feature names need to be
migrated over.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
Was against this for a while due to the length of timestamps clogging
the console, but it makes sense to know when something goes wrong.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
On a basic python class, class attributes are handled by reference,
meaning that every instance of embeddings would attach to that reference
and allocate more memory.
Switch to a Pydantic class and factory methods when instantiating.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
The previous template was compatible with Jinja2 in Python, but it
was not cross-platform compatible according to HF's standards.
Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>
The props endpoint is a standard used by llamacpp APIs which returns
various properties of a model to a server. It's still recommended to
use /v1/model to get all the parameters a TabbyAPI model has.
Also include the contents of a prompt template when fetching the current
model.
Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>
* Ensure that length of positive/negative prompt + max_tokens does not exceed max_seq_len
* Ensure that total required pages for CFG request does not exceed allocated cache_size
The vision module from the ExllamaV2 backend is used in files outside
the backends contained folder. Therefore, import ExllamaV2 as an
optional dependency here.
Signed-off-by: kingbri <bdashore3@proton.me>
The strings weren't being concatenated properly. Only add the combined
text if the chat completion type is a List.
Signed-off-by: kingbri <bdashore3@proton.me>
If vision is enabled and the model doesn't support it, send an
error asking the user to reload. Also, add a method to unload the
vision tower.
Signed-off-by: kingbri <bdashore3@proton.me>
The model_type internal reference was changed to an enum for
a more extendable loading process. Return the current model type
when loading a new model.
Signed-off-by: kingbri <bdashore3@proton.me>
Previously, the flow for parsing chat completion messages and rendering
from the prompt template was disconnected between endpoints. Now, create
a common function to render and handle everything appropriately afterwards.
Signed-off-by: kingbri <bdashore3@proton.me>
Migrate the add method into the class itself. Also, a BaseModel isn't
needed here since this isn't a serialized class.
Signed-off-by: kingbri <bdashore3@proton.me>
Previously, the messages were a list of dicts. These are untyped
and don't provide strict hinting. Add types for chat completion
messages and reformat existing code.
Signed-off-by: kingbri <bdashore3@proton.me>