Commit Graph

25 Commits

Author SHA1 Message Date
kingbri
94696543bc Model: Warn user if context > max_seq_len
Unlike other backends, tabby attempts to generate even if the context
is greater than the max sequence length via truncation of the given
context.

Rather than artifically erroring out, give a warning that outputted
console metrics are going to be incorrect and to make sure that
context <= max_seq_len.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-29 01:35:32 -05:00
kingbri
cad144126f API: Rename repetition_decay -> repetition_slope
Also fix the fallback to use 0 for sanity checking and validation.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-29 01:13:05 -05:00
kingbri
d47c39da54 API: Don't include draft directory in response
The draft directory should be returned for a draft model request (TBD).

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-23 00:07:56 -05:00
kingbri
71b9a53336 API: Add temperature_last support
Documented in previous commits. Also make sure that for version checking,
check the value of kwargs instead of if the key is present since requests
pass default values.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-21 21:20:59 -05:00
turboderp
3337fe6acc Warning if unsupported samplers are used 2023-11-21 18:35:22 +01:00
turboderp
a54de11cf3 Add new samplers 2023-11-21 18:16:53 +01:00
Veden
f960fac8ff Fix incorrect ratio calculation for draft model 2023-11-19 13:12:53 -08:00
kingbri
4cddd0400c Model: Fix draft model loading
Use draft_config to find the path instead of kwargs.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 02:04:02 -05:00
kingbri
31bc418795 Model: Add context in response output
When printing to the console, give information about the context
(ingestion token count).

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 00:49:32 -05:00
kingbri
6b9af58cc1 Tree: Fix extraneous bugs and update T/s print
Model: Add extra information to print and fix the divide by zero error.
Auth: Fix validation of API and admin keys to look for the entire key.

References #7 and #6

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-18 22:34:40 -05:00
Brian Dashore
b2410a0436 Merge pull request #4 from waldfee/config_samples
Adds draft model support to config.yml
2023-11-18 13:16:23 -05:00
kingbri
27ebec3b35 Model: Add speculative decoding support via config
Speculative decoding makes use of draft models that ingest the prompt
before forwarding it to the main model.

Add options in the config to support this. API options will occur
in a different commit.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-18 01:42:20 -05:00
kingbri
2ad79cb9ea Model: Add tokens in responses
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-17 23:33:48 -05:00
kingbri
9dfa580b1e Model: Add tokens/second output
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-17 01:16:20 -05:00
kingbri
d5551352bf Model: Fix parsing of stop conditions
Add the EOS token into stop strings after checking kwargs. If
ban_eos_token is on, don't add the EOS token in for extra measure.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 17:15:33 -05:00
kingbri
126afdfdc2 Model: Fix gpu split params
GPU split auto is a bool and GPU split is an array of integers for
GBs to allocate per GPU.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 00:55:15 -05:00
kingbri
ea91d17a11 Api: Add ban_eos_token and add_bos_token support
Adds the ability for the client to specify whether to add the BOS
token and ban the EOS token.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 00:55:15 -05:00
kingbri
8fea5391a8 Api: Add token endpoints
Support for encoding and decoding with various parameters.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 00:55:15 -05:00
kingbri
b625bface9 OAI: Add API-based model loading/unloading and auth routes
Models can be loaded and unloaded via the API. Also add authentication
to use the API and for administrator tasks.

Both types of authorization use different keys.

Also fix the unload function to properly free all used vram.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-14 01:17:19 -05:00
kingbri
47343e2f1a OAI: Add models support
The models endpoint fetches all the models that OAI has to offer.
However, since this is an OAI clone, just list the models inside
the user's configured model directory instead.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-13 21:38:34 -05:00
kingbri
eee8b642bd OAI: Implement completion API endpoint
Add support for /v1/completions with the option to use streaming
if needed. Also rewrite API endpoints to use async when possible
since that improves request performance.

Model container parameter names also needed rewrites as well and
set fallback cases to their disabled values.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-13 18:31:26 -05:00
turboderp
4fa4386275 Add new samplers 2023-11-12 08:12:08 +01:00
kingbri
a10c14d357 Config: Switch to YAML and add load progress
YAML is a more flexible format when it comes to configuration. Commandline
arguments are difficult to remember and configure especially for
an API with complicated commandline names. Rather than using half-baked
textfiles, implement a proper config solution.

Also add a progress bar when loading models in the commandline.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-12 00:21:16 -05:00
kingbri
5d32aa02cd Tree: Update to use ModelContainer and args
Use command-line arguments to load an initial model if necessary.
API routes are broken, but we should be using the container from
now on as a primary interface with the exllama2 library.

Also these args should be turned into a YAML configuration file in
the future.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-10 23:19:54 -05:00
turboderp
9d34479e3e Model container with generator logic, initial 2023-11-11 02:53:00 +01:00