Commit Graph

30 Commits

Author SHA1 Message Date
kingbri
39617adb65 Requirements: Update Exllamav2
v0.0.15

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-06 22:29:55 -05:00
kingbri
ccd41d720d Requirements: Bump ExllamaV2
v0.0.14

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-24 12:26:08 -05:00
kingbri
ea00a6bd45 Requirements: Update Exllamav2
Update to v0.0.13.post2

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-14 21:51:25 -05:00
kingbri
321c9a1ea9 Requirements: Fix FA2 version number
The URL wasn't edited correctly

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-07 21:37:30 -05:00
kingbri
d0027bce32 Requirements: Update flash attention 2 for Windows
Version 2.5.2

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-07 20:44:23 -05:00
kingbri
543a9b68c8 Requirements: Update Exllamav2 to 0.0.13.post1
Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-04 21:25:57 -05:00
kingbri
6eeb62b82c Requirements: Update exllamav2, torch, and FA2
Torch to 2.2, exllamav2 to 0.0.13, FA2 to 2.4.2 on Windows and 2.5.2
on Linux.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-02 23:53:42 -05:00
kingbri
3605067898 Requirements: Don't use torch 2.2
Pytorch released 2.2 without letting the community know first. Pin
the torch version to 2.1.2 until exllamav2 builds for torch 2.2

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-29 23:30:10 -05:00
kingbri
ee99349a78 Requirements: Bump exllamav2
0.0.12

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-22 21:13:31 -05:00
kingbri
162c13752a Requirements: Update to Flash Attention 2.4.1
Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-25 14:40:08 -05:00
AlpinDale
6a5bbd217c feat: logging (#39)
* add logging

* simplify the logger

* formatting

* final touches

* fix format

* Model: Add log to metrics

Signed-off-by: kingbri <bdashore3@proton.me>

---------

Authored-by: AlpinDale <52078762+AlpinDale@users.noreply.github.com>
2023-12-23 04:33:31 +00:00
kingbri
da69ad8cd3 Requirements: Pin versions for some dependencies
Pydantic and Jinja2 need pinned versions.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-19 21:48:04 -05:00
kingbri
51ca1ff396 Tree: Switch to Pydantic 2
Pydantic 2 has more modern methods and stability compared to Pydantic 1

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
f631dd6ff7 Templates: Switch to Jinja2
Jinja2 is a lightweight template parser that's used in Transformers
for parsing chat completions. It's much more efficient than Fastchat
and can be imported as part of requirements.

Also allows for unblocking Pydantic's version.

Users now have to provide their own template if needed. A separate
repo may be usable for common prompt template storage.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
f196f1177d Requirements: Update exllamav2 to 0.0.11
Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-16 19:33:42 -05:00
kingbri
47176a2a1e Requirements: Fix torch install
Use --extra-index-url to install pytorch. This should be secure enough
since dependency confusion attacks aren't possible with just installing
the torch package.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-06 19:04:35 -05:00
kingbri
b83e1b704e Requirements: Split for configurations
Add self-contained requirements for cuda 11.8 and ROCm

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-06 00:00:30 -05:00
kingbri
621e11b940 Update documentation
Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-05 00:33:43 -05:00
kingbri
e740b53478 Requirements: Update Flash Attention 2
Bump to 2.3.6

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-03 01:56:29 -05:00
kingbri
ae69b18583 API: Use FastAPI streaming instead of sse_starlette
sse_starlette kept firing a ping response if it was taking too long
to set an event. Rather than using a hacky workaround, switch to
FastAPI's inbuilt streaming response and construct SSE requests with
a utility function.

This helps the API become more robust and removes an extra requirement.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-01 01:54:35 -05:00
kingbri
d25310e55d Requirements: Update Flash Attention 2
Use 2.3.4 from tgw. However, keep the 2.3.3 wheels in requirements
if the newer wheels don't work for now.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-21 22:12:55 -05:00
kingbri
a51889bdb8 Requirements: Update Flash Attention
Bump to version 2.3.3.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-18 22:28:24 -05:00
Splice86
feef782dbf Update requirements.txt to include uvicorn 2023-11-16 22:50:27 +00:00
kingbri
b20e71dcd4 Requirements: Add Flash Attention 2 wheels
Update to 2.3.3 at some point.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 17:25:00 -05:00
kingbri
03f45cb0a3 Tree: Update documentation and configs
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 02:30:33 -05:00
kingbri
2248705c4a Requirements: Don't force fastchat installation
Fastchat requires a lot of dependencies such as transformers, peft,
and accelerate which are heavy. This is not useful unless a user
wants to add a shim for the chat completion endpoint.

Instead, try importing fastchat and notify the console of the error.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 01:26:46 -05:00
kingbri
1f444c8fb7 Requirements: Add fastchat and override pydantic
Use an older version of pydantic to stay compatible

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 01:00:08 -05:00
kingbri
eee8b642bd OAI: Implement completion API endpoint
Add support for /v1/completions with the option to use streaming
if needed. Also rewrite API endpoints to use async when possible
since that improves request performance.

Model container parameter names also needed rewrites as well and
set fallback cases to their disabled values.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-13 18:31:26 -05:00
kingbri
a10c14d357 Config: Switch to YAML and add load progress
YAML is a more flexible format when it comes to configuration. Commandline
arguments are difficult to remember and configure especially for
an API with complicated commandline names. Rather than using half-baked
textfiles, implement a proper config solution.

Also add a progress bar when loading models in the commandline.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-12 00:21:16 -05:00
david
b967e2e604 Initial 2023-11-09 21:27:45 -06:00