Commit Graph

66 Commits

Author SHA1 Message Date
kingbri
698b0b1976 Update README
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 01:19:31 -05:00
kingbri
581e1fc219 Sample config: Remove unused value
Draft models are specified in the draft sublock.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 01:16:03 -05:00
kingbri
e0e93c103b Sample config: Uncomment all parameters
This helps clarify things when users are configuring for the first
time. For example, some users were putting the model name in the
"model" block instead of the "model_name" field.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 01:12:07 -05:00
kingbri
63762654f0 Update README
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 01:05:49 -05:00
Brian Dashore
e46676cb08 Merge pull request #9 from city-unit/main
Add basic docker support
2023-11-19 00:53:24 -05:00
kingbri
e4a8848445 Auth: Log API and admin key on startup
Helpful for users who run headless or use Docker.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 00:52:39 -05:00
kingbri
31bc418795 Model: Add context in response output
When printing to the console, give information about the context
(ingestion token count).

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 00:49:32 -05:00
city_unit
80c69939ae Remove unneeded stuffs 2023-11-19 00:34:54 -05:00
kingbri
f47919b1d3 API: Add draft model support
Models can be loaded with a child object called "draft" in the POST
request. Again, models need to be located within the draft model dir
to get loaded.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 00:32:25 -05:00
city_unit
6b22dc0119 Rename, fschat support 2023-11-19 00:32:14 -05:00
city_unit
99cf0b6d7b Add basic docker support 2023-11-19 00:01:17 -05:00
kingbri
6b9af58cc1 Tree: Fix extraneous bugs and update T/s print
Model: Add extra information to print and fix the divide by zero error.
Auth: Fix validation of API and admin keys to look for the entire key.

References #7 and #6

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-18 22:34:40 -05:00
kingbri
a51889bdb8 Requirements: Update Flash Attention
Bump to version 2.3.3.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-18 22:28:24 -05:00
Brian Dashore
b2410a0436 Merge pull request #4 from waldfee/config_samples
Adds draft model support to config.yml
2023-11-18 13:16:23 -05:00
kingbri
27ebec3b35 Model: Add speculative decoding support via config
Speculative decoding makes use of draft models that ingest the prompt
before forwarding it to the main model.

Add options in the config to support this. API options will occur
in a different commit.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-18 01:42:20 -05:00
kingbri
2ad79cb9ea Model: Add tokens in responses
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-17 23:33:48 -05:00
kingbri
7f18ea1d7c Tree: Remove SillyTavern shim docs
Support has been added in SillyTavern's staging branch.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-17 22:03:46 -05:00
kingbri
6f2078cbe4 Update README
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-17 22:02:21 -05:00
kingbri
d627d14385 API: Fix exceptions and defaults
Stop conditions was None, causing model to error out when trying to
add the EOS token to a None value.

Authentication failed when Bearer contained an empty string. To fix
this, add a condition which checks array length.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-17 17:56:05 -05:00
waldfee
78a6587b95 add cache_mode and draft_model_dir to config_sample.yml 2023-11-17 22:08:31 +01:00
kingbri
4669e49ff0 API: Fix errors with token endpoint
Handle None cases if the provided text/token lists are empty.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-17 01:39:06 -05:00
kingbri
9dfa580b1e Model: Add tokens/second output
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-17 01:16:20 -05:00
kingbri
021981fce0 API: Re-add depends endpoints
Mistakenly removed API key authentication for the models endpoints in
testing.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-17 00:50:42 -05:00
kingbri
ac4e9c2277 API: Add CORS support
Tell CORS to go fly a kite.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 22:19:47 -05:00
kingbri
08a183540b Config: Add warning on exceptions and clarify parameters
Due to how YAML works, double quotes are bad. Specify a linter in
the top of the config_sample file.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 22:19:47 -05:00
Splice86
feef782dbf Update requirements.txt to include uvicorn 2023-11-16 22:50:27 +00:00
Brian Dashore
d5374c2c1f Create LICENSE
Use AGPLv3 for this project

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 17:43:23 -05:00
kingbri
2cf93c092b Add SillyTavern instructions
Temporary until proper support is added in.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 17:33:23 -05:00
kingbri
b20e71dcd4 Requirements: Add Flash Attention 2 wheels
Update to 2.3.3 at some point.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 17:25:00 -05:00
kingbri
d5551352bf Model: Fix parsing of stop conditions
Add the EOS token into stop strings after checking kwargs. If
ban_eos_token is on, don't add the EOS token in for extra measure.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 17:15:33 -05:00
kingbri
282b5b2931 API: Fix responses and some params
Responses were not being properly sent as JSON. Only run pydantic's
JSON function on stream responses. FastAPI does the rest with static
responses.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 17:11:55 -05:00
kingbri
d8d61fa19b API: Add fallback if model isn't loaded
Most endpoints require the model to be loaded, so add a depends.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 12:20:35 -05:00
kingbri
c0525c042e Update README
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 12:06:37 -05:00
kingbri
60eb076b43 Tree: Basic formatting and comments
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 11:48:40 -05:00
kingbri
5defb1b0b4 Config: Fix errors when stuff doesn't exist
Add safe fallbacks if any part of the config tree doesn't exist. This
prevents random internal server errors from showing up.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 11:41:03 -05:00
kingbri
03f45cb0a3 Tree: Update documentation and configs
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 02:30:33 -05:00
kingbri
2248705c4a Requirements: Don't force fastchat installation
Fastchat requires a lot of dependencies such as transformers, peft,
and accelerate which are heavy. This is not useful unless a user
wants to add a shim for the chat completion endpoint.

Instead, try importing fastchat and notify the console of the error.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 01:26:46 -05:00
kingbri
5e8419ec0c OAI: Add chat completions endpoint
Chat completions is the endpoint that will be used by OAI in the
future. Makes sense to support it even though the completions
endpoint will be used more often.

Also unify common parameters between the chat completion and completion
requests since they're very similar.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 01:06:07 -05:00
kingbri
593471a04d Auth: Fix init from YAML dict
A class can't have multiple constructors.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 23:00:12 -05:00
kingbri
1f444c8fb7 Requirements: Add fastchat and override pydantic
Use an older version of pydantic to stay compatible

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 01:00:08 -05:00
kingbri
bbb59d0747 Auth: Fix methods for writing and validation
These were not working properly. Make the YAML file get written
to properly and the validator to return a 401 when the bearer
token is invalid.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 00:55:15 -05:00
kingbri
cb8da7f092 Chore: Remove mistakenly committed file
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 00:55:15 -05:00
kingbri
d0b6b11068 OAI: Make freq and presence pen floats
Also rename the completions typing file.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 00:55:15 -05:00
kingbri
126afdfdc2 Model: Fix gpu split params
GPU split auto is a bool and GPU split is an array of integers for
GBs to allocate per GPU.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 00:55:15 -05:00
kingbri
ea91d17a11 Api: Add ban_eos_token and add_bos_token support
Adds the ability for the client to specify whether to add the BOS
token and ban the EOS token.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 00:55:15 -05:00
kingbri
8fea5391a8 Api: Add token endpoints
Support for encoding and decoding with various parameters.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 00:55:15 -05:00
kingbri
2d741653c3 Update .gitignore
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 00:55:15 -05:00
Splice86
fc14046318 Updated readme 2023-11-14 21:17:03 -06:00
Splice86
4fd7da8fb6 Updated readme 2023-11-14 21:16:24 -06:00
Splice86
a0cf65e88f Updated readme 2023-11-14 21:13:36 -06:00