Commit Graph

11 Commits

Author SHA1 Message Date
kingbri
ae69b18583 API: Use FastAPI streaming instead of sse_starlette
sse_starlette kept firing a ping response if it was taking too long
to set an event. Rather than using a hacky workaround, switch to
FastAPI's inbuilt streaming response and construct SSE requests with
a utility function.

This helps the API become more robust and removes an extra requirement.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-01 01:54:35 -05:00
kingbri
d25310e55d Requirements: Update Flash Attention 2
Use 2.3.4 from tgw. However, keep the 2.3.3 wheels in requirements
if the newer wheels don't work for now.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-21 22:12:55 -05:00
kingbri
a51889bdb8 Requirements: Update Flash Attention
Bump to version 2.3.3.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-18 22:28:24 -05:00
Splice86
feef782dbf Update requirements.txt to include uvicorn 2023-11-16 22:50:27 +00:00
kingbri
b20e71dcd4 Requirements: Add Flash Attention 2 wheels
Update to 2.3.3 at some point.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 17:25:00 -05:00
kingbri
03f45cb0a3 Tree: Update documentation and configs
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 02:30:33 -05:00
kingbri
2248705c4a Requirements: Don't force fastchat installation
Fastchat requires a lot of dependencies such as transformers, peft,
and accelerate which are heavy. This is not useful unless a user
wants to add a shim for the chat completion endpoint.

Instead, try importing fastchat and notify the console of the error.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 01:26:46 -05:00
kingbri
1f444c8fb7 Requirements: Add fastchat and override pydantic
Use an older version of pydantic to stay compatible

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 01:00:08 -05:00
kingbri
eee8b642bd OAI: Implement completion API endpoint
Add support for /v1/completions with the option to use streaming
if needed. Also rewrite API endpoints to use async when possible
since that improves request performance.

Model container parameter names also needed rewrites as well and
set fallback cases to their disabled values.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-13 18:31:26 -05:00
kingbri
a10c14d357 Config: Switch to YAML and add load progress
YAML is a more flexible format when it comes to configuration. Commandline
arguments are difficult to remember and configure especially for
an API with complicated commandline names. Rather than using half-baked
textfiles, implement a proper config solution.

Also add a progress bar when loading models in the commandline.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-12 00:21:16 -05:00
david
b967e2e604 Initial 2023-11-09 21:27:45 -06:00