Commit Graph

46 Commits

Author SHA1 Message Date
kingbri
ab10b263fd Model: Add override base seq len
Some models (such as mistral and mixtral) set their base sequence
length to 32k due to assumptions of support for sliding window
attention.

Therefore, add this parameter to override the base sequence length
of a model which helps with auto-calculation of rope alpha.

If auto-calculation of rope alpha isn't being used, the max_seq_len
parameter works fine as is.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-20 00:45:39 -05:00
kingbri
c9e43e51aa API: Add route for draft model list
Does the same thing as model list except with draft models.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-19 23:45:53 -05:00
kingbri
ce2602df9a Model: Fix max seq len handling
Previously, the max sequence length was overriden by the user's
config and never took the model's config.json into account.

Now, set the default to 4096, but include config.prepare when
selecting the max sequence length. The yaml and API request
now serve as overrides rather than parameters.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-19 23:37:52 -05:00
kingbri
c3f7898967 OAI: Add logit bias support
Use exllamav2's token bias which is the functional equivalent of
OAI's logit bias parameter.

Strings are casted to integers on request and errors if an invalid
value is passed.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
bc21f0bbc0 OAI: Add field aliasing
Repetition penalty range needs field aliases to support multiple
parameter calls.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
de9a19b5d3 Templating: Add generation prompt appending
Append generation prompts if given the flag on an OAI chat completion
request.

This appends the "assistant" message to the instruct prompt. Defaults
to true since this is intended behavior.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
a87e474660 OAI: Fix chat completion validation
Validation wasn't properly run on older pydantic, so ChatCompletionRespChoice
was being sent instead of a ChatCompletionMessage when streaming
responses.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
e895eaa4bd OAI: Clarify types in docs
Adding field descriptions show which parameters are used solely for
OAI compliance and not actually parsed in the model code.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
51ca1ff396 Tree: Switch to Pydantic 2
Pydantic 2 has more modern methods and stability compared to Pydantic 1

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
f631dd6ff7 Templates: Switch to Jinja2
Jinja2 is a lightweight template parser that's used in Transformers
for parsing chat completions. It's much more efficient than Fastchat
and can be imported as part of requirements.

Also allows for unblocking Pydantic's version.

Users now have to provide their own template if needed. A separate
repo may be usable for common prompt template storage.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
ad8807a830 Model: Add support for num_experts_by_token
New parameter that's safe to edit in exllamav2 v0.0.11. Only recommended
for people who know what they're doing.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-17 18:03:01 -05:00
kingbri
70fbee3edd OAI: Fix model parameter placement
Accidentally edited the Model Card parameters vs the model load request
ones.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-17 14:36:28 -05:00
kingbri
1d0bdfa77c Model + OAI: Fix parameter parsing
Rope alpha changes don't require removing the 1.0 default
from Rope scale.

Keep defaults when possible to avoid errors.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-17 14:28:18 -05:00
Veden
3e57125025 OAI: adding optional draft model properties for draft_rope alpha and scale (#28)
* OAI: adding optional draft model properties for draft_rope alpha and scale

* Forgot to set the properties to None
2023-12-17 19:23:45 +00:00
kingbri
1a331afe3a OAI: Add cache_mode parameter to model
Mistakenly forgot that the user can choose what cache mode to use
when loading a model.

Also add when fetching model info.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-16 02:47:50 -05:00
kingbri
ed868fd262 OAI: Remove unused parameters
Seed and low_mem aren't used, so comment them out.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-15 14:56:43 -05:00
kingbri
083df7d585 Tree: Add generation logging support
Generations can be logged in the console along with sampling parameters
if the user enables it in config.

Metrics are always logged at the end of each prompt. In addition,
the model endpoint tells the user if they're being logged or not
for transparancy purposes.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-12 23:43:35 -05:00
kingbri
db87efde4a OAI: Add ability to specify fastchat prompt template
Sometimes fastchat may not be able to detect the prompt template from
the model path. Therefore, add the ability to set it in config.yml or
via the request object itself.

Also send the provided prompt template on model info request.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-10 15:43:58 -05:00
kingbri
fd9f3eac87 Model: Add params to current model endpoint
Grabs the current model rope params, max seq len, and the draft model
if applicable.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-10 00:40:56 -05:00
kingbri
5ae2a91c04 Tree: Use unwrap and coalesce for optional handling
Python doesn't have proper handling of optionals. The only way to
handle them is checking via an if statement if the value is None or
by using the "or" keyword to unwrap optionals.

Previously, I used the "or" method to unwrap, but this caused issues
due to falsy values falling back to the default. This is especially
the case with booleans were "False" changed to "True".

Instead, add two new functions: unwrap and coalesce. Both function
to properly implement a functional way of "None" coalescing.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-09 21:52:17 -05:00
DocShotgun
7380a3b79a Implement lora support (#24)
* Model: Implement basic lora support

* Add ability to load loras from config on launch
* Supports loading multiple loras and lora scaling
* Add function to unload loras

* Colab: Update for basic lora support

* Model: Test vram alloc after lora load, add docs

* Git: Add loras folder to .gitignore

* API: Add basic lora-related endpoints

* Add /loras/ endpoint for querying available loras
* Add /model/lora endpoint for querying currently loaded loras
* Add /model/lora/load endpoint for loading loras
* Add /model/lora/unload endpoint for unloading loras
* Move lora config-checking logic to main.py for better compat with API endpoints

* Revert bad CRLF line ending changes

* API: Add basic lora-related endpoints (fixed)

* Add /loras/ endpoint for querying available loras
* Add /model/lora endpoint for querying currently loaded loras
* Add /model/lora/load endpoint for loading loras
* Add /model/lora/unload endpoint for unloading loras
* Move lora config-checking logic to main.py for better compat with API endpoints

* Model: Unload loras first when unloading model

* API + Models: Cleanup lora endpoints and functions

Condenses down endpoint and model load code. Also makes the routes
behave the same way as model routes to help not confuse the end user.

Signed-off-by: kingbri <bdashore3@proton.me>

* Loras: Optimize load endpoint

Return successes and failures along with consolidating the request
to the rewritten load_loras function.

Signed-off-by: kingbri <bdashore3@proton.me>

---------

Co-authored-by: kingbri <bdashore3@proton.me>
Co-authored-by: DocShotgun <126566557+DocShotgun@users.noreply.github.com>
2023-12-08 23:38:08 -05:00
kingbri
f8e9e22c43 API: Fix model load endpoint with draft
Draft wasn't being parsed correctly with the new changes which removed
the draft_enabled bool. There's still some more work to be done with
returning exceptions.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-06 18:05:55 -05:00
kingbri
6a71890d45 Model: Fix sampler bugs
Lots of bugs were unearthed when switching to the new fallback changes.
Fix them and make sure samplers are being set properly.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-06 17:29:58 -05:00
kingbri
61f6e51fdb OAI: Add separator style fallback
Some models may return None for separator style with FastChat. Fall
back to LLAMA2 if this is the case.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-01 23:30:19 -05:00
kingbri
aef411bed5 OAI: Fix chat completion streaming
Chat completions require a finish reason to be provided in the OAI
spec once the streaming is completed. This is different from a non-
streaming chat completion response.

Also fix some errors that were raised from the endpoint.

References #15

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-01 00:14:24 -05:00
kingbri
e703c716ee Merge branch 'main' of https://github.com/ziadloo/tabbyAPI into ziadloo-main 2023-11-30 01:01:48 -05:00
kingbri
3957316b79 Revert "API: Rename repetition_decay -> repetition_slope"
This reverts commit cad144126f.

Change this parameter back to repetition_decay. This is different than
rep_pen_slope used in other backends such as kobold and NAI.

Still keep the fallback condition though.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-29 22:03:45 -05:00
kingbri
cad144126f API: Rename repetition_decay -> repetition_slope
Also fix the fallback to use 0 for sanity checking and validation.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-29 01:13:05 -05:00
kingbri
5cbf7f13da OAI: Fix repetition range
Alias repetition_penalty_range to repetition_range since that's used
as an internal variable. Perhaps in the future, there should be a function
that allows for iterating through request aliases and give a default value.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-29 00:53:19 -05:00
Mehran Ziadloo
ead503c75b Adding token usage support 2023-11-27 20:05:05 -08:00
kingbri
d47c39da54 API: Don't include draft directory in response
The draft directory should be returned for a draft model request (TBD).

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-23 00:07:56 -05:00
kingbri
71b9a53336 API: Add temperature_last support
Documented in previous commits. Also make sure that for version checking,
check the value of kwargs instead of if the key is present since requests
pass default values.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-21 21:20:59 -05:00
kingbri
f47919b1d3 API: Add draft model support
Models can be loaded with a child object called "draft" in the POST
request. Again, models need to be located within the draft model dir
to get loaded.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 00:32:25 -05:00
kingbri
d627d14385 API: Fix exceptions and defaults
Stop conditions was None, causing model to error out when trying to
add the EOS token to a None value.

Authentication failed when Bearer contained an empty string. To fix
this, add a condition which checks array length.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-17 17:56:05 -05:00
kingbri
282b5b2931 API: Fix responses and some params
Responses were not being properly sent as JSON. Only run pydantic's
JSON function on stream responses. FastAPI does the rest with static
responses.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 17:11:55 -05:00
kingbri
60eb076b43 Tree: Basic formatting and comments
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 11:48:40 -05:00
kingbri
2248705c4a Requirements: Don't force fastchat installation
Fastchat requires a lot of dependencies such as transformers, peft,
and accelerate which are heavy. This is not useful unless a user
wants to add a shim for the chat completion endpoint.

Instead, try importing fastchat and notify the console of the error.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 01:26:46 -05:00
kingbri
5e8419ec0c OAI: Add chat completions endpoint
Chat completions is the endpoint that will be used by OAI in the
future. Makes sense to support it even though the completions
endpoint will be used more often.

Also unify common parameters between the chat completion and completion
requests since they're very similar.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 01:06:07 -05:00
kingbri
d0b6b11068 OAI: Make freq and presence pen floats
Also rename the completions typing file.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 00:55:15 -05:00
kingbri
126afdfdc2 Model: Fix gpu split params
GPU split auto is a bool and GPU split is an array of integers for
GBs to allocate per GPU.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 00:55:15 -05:00
kingbri
ea91d17a11 Api: Add ban_eos_token and add_bos_token support
Adds the ability for the client to specify whether to add the BOS
token and ban the EOS token.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 00:55:15 -05:00
kingbri
8fea5391a8 Api: Add token endpoints
Support for encoding and decoding with various parameters.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 00:55:15 -05:00
kingbri
4670a77c26 API: Don't use response_class
This arg in routes caused many errors and isn't even needed for
responses.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-14 22:09:26 -05:00
kingbri
b625bface9 OAI: Add API-based model loading/unloading and auth routes
Models can be loaded and unloaded via the API. Also add authentication
to use the API and for administrator tasks.

Both types of authorization use different keys.

Also fix the unload function to properly free all used vram.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-14 01:17:19 -05:00
kingbri
47343e2f1a OAI: Add models support
The models endpoint fetches all the models that OAI has to offer.
However, since this is an OAI clone, just list the models inside
the user's configured model directory instead.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-13 21:38:34 -05:00
kingbri
eee8b642bd OAI: Implement completion API endpoint
Add support for /v1/completions with the option to use streaming
if needed. Also rewrite API endpoints to use async when possible
since that improves request performance.

Model container parameter names also needed rewrites as well and
set fallback cases to their disabled values.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-13 18:31:26 -05:00