Commit Graph

10 Commits

Author SHA1 Message Date
kingbri
c67c9f6d66 Model + Config: Remove low_mem option
Low_mem doesn't work in exl2 and it was an experimental option to
begin with. Keep the loading code commented out in case it gets fixed
in the future.

A better alternative is to use 8bit cache which works and helps save
VRAM.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-03 01:07:42 -05:00
kingbri
6493b1d2aa OAI: Add ability to send dummy models
Some APIs require an OAI model to be sent against the models endpoint.
Fix this by adding a GPT 3.5 turbo entry as first in the list to cover
as many APIs as possible.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-01 00:27:28 -05:00
kingbri
581e1fc219 Sample config: Remove unused value
Draft models are specified in the draft sublock.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 01:16:03 -05:00
kingbri
e0e93c103b Sample config: Uncomment all parameters
This helps clarify things when users are configuring for the first
time. For example, some users were putting the model name in the
"model" block instead of the "model_name" field.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 01:12:07 -05:00
kingbri
27ebec3b35 Model: Add speculative decoding support via config
Speculative decoding makes use of draft models that ingest the prompt
before forwarding it to the main model.

Add options in the config to support this. API options will occur
in a different commit.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-18 01:42:20 -05:00
waldfee
78a6587b95 add cache_mode and draft_model_dir to config_sample.yml 2023-11-17 22:08:31 +01:00
kingbri
08a183540b Config: Add warning on exceptions and clarify parameters
Due to how YAML works, double quotes are bad. Specify a linter in
the top of the config_sample file.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 22:19:47 -05:00
kingbri
03f45cb0a3 Tree: Update documentation and configs
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 02:30:33 -05:00
kingbri
b625bface9 OAI: Add API-based model loading/unloading and auth routes
Models can be loaded and unloaded via the API. Also add authentication
to use the API and for administrator tasks.

Both types of authorization use different keys.

Also fix the unload function to properly free all used vram.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-14 01:17:19 -05:00
kingbri
a10c14d357 Config: Switch to YAML and add load progress
YAML is a more flexible format when it comes to configuration. Commandline
arguments are difficult to remember and configure especially for
an API with complicated commandline names. Rather than using half-baked
textfiles, implement a proper config solution.

Also add a progress bar when loading models in the commandline.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-12 00:21:16 -05:00