Commit Graph

13 Commits

Author SHA1 Message Date
kingbri
56fdfb5f8e OAI: Add stream to gen params
Good for logging.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-21 00:55:44 -04:00
kingbri
07d9b7cf7b Model: Add abort on generation
When the model is processing a prompt, add the ability to abort
on request cancellation. This is also a catch for a SIGINT.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-20 15:21:37 -04:00
kingbri
2704ff8344 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-18 16:02:29 -04:00
kingbri
5c7fc69ded API: Fix finish_reason returns
OAI expects finish_reason to be "stop" or "length" (there are others,
but they're not in the current scope of this project).

Make all completions and chat completions responses return this
from the model generation itself rather than putting a placeholder.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-18 15:59:28 -04:00
kingbri
25f5d4a690 API: Cleanup permission endpoint
Don't return an OAI specific type from a common file.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-18 15:13:26 -04:00
kingbri
3c08f46c51 Endpoints: Add key permission checker
This is a definite way to check if an authorized key is API or admin.
The endpoint only runs if the key is valid in the first place to keep
inline with the API's security model.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-18 00:53:27 -04:00
kingbri
14d8ec2007 Signal: Fix signal handlers for uvicorn
Add the ability to override uvicorn's signal handler in addition
to using main's signal handler for any SIGINTs before the API server
starts.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-16 23:23:31 -04:00
kingbri
2755fd1af0 API: Fix blocking iterator execution
Run these iterators on the background thread. On startup, the API
spawns a background thread as needed to run sync code on without blocking
the event loop.

Use asyncio's run_thread function since it allows for errors to be
propegated.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-16 23:23:31 -04:00
kingbri
7fded4f183 Tree: Switch to async generators
Async generation helps remove many roadblocks to managing tasks
using threads. It should allow for abortables and modern-day paradigms.

NOTE: Exllamav2 itself is not an asynchronous library. It's just
been added into tabby's async nature to allow for a fast and concurrent
API server. It's still being debated to run stream_ex in a separate
thread or manually manage it using asyncio.sleep(0)

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-16 23:23:31 -04:00
kingbri
33e2df50b7 API: Disable SSE ping chunks
These are mainly used for some clients that ping to see if the request
is alive. However, we don't need this.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-14 20:47:05 -04:00
kingbri
1ec8eb9620 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-13 00:02:55 -04:00
kingbri
6f03be9523 API: Split functions into their own files
Previously, generation function were bundled with the request function
causing the overall code structure and API to look ugly and unreadable.

Split these up and cleanup a lot of the methods that were previously
overlooked in the API itself.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-12 23:59:30 -04:00
kingbri
104a6121cb API: Split into separate folder
Moving the API into its own directory helps compartmentalize it
and allows for cleaning up the main file to just contain bootstrapping
and the entry point.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-12 23:59:30 -04:00