Commit Graph

10 Commits

Author SHA1 Message Date
kingbri
2a0aaa2e8a OAI: Add ability to pass extra vars in jinja templates
A chat completion can now declare extra template_vars to pass when
a template is rendered, opening up the possibility of using state
outside of huggingface's parameters.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-11 09:49:25 -04:00
kingbri
f9f8c97c6d Templates: Fix stop_string parsing
Template modules grab all set vars, including ones that use runtime
vars. If a template var is set to a runtime var and a module is created,
an UndefinedError fires.

Use make_module instead to pass runtime vars when creating a template
module.

Resolves #92

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-02 00:44:04 -04:00
kingbri
e8b6a02aa8 API: Move prompt template construction to utils
Best to move the inner workings within its inner function. Also fix
an edge case where stop strings can be a string rather than an array.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-29 02:24:13 -04:00
kingbri
6dfcbbd813 Common: Migrate request utils to networking
Helps organize the project better. Utils is meant to be for simple
functions like unwrap.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-21 23:21:57 -04:00
kingbri
07d9b7cf7b Model: Add abort on generation
When the model is processing a prompt, add the ability to abort
on request cancellation. This is also a catch for a SIGINT.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-20 15:21:37 -04:00
kingbri
2704ff8344 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-18 16:02:29 -04:00
kingbri
5c7fc69ded API: Fix finish_reason returns
OAI expects finish_reason to be "stop" or "length" (there are others,
but they're not in the current scope of this project).

Make all completions and chat completions responses return this
from the model generation itself rather than putting a placeholder.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-18 15:59:28 -04:00
kingbri
7fded4f183 Tree: Switch to async generators
Async generation helps remove many roadblocks to managing tasks
using threads. It should allow for abortables and modern-day paradigms.

NOTE: Exllamav2 itself is not an asynchronous library. It's just
been added into tabby's async nature to allow for a fast and concurrent
API server. It's still being debated to run stream_ex in a separate
thread or manually manage it using asyncio.sleep(0)

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-16 23:23:31 -04:00
kingbri
1ec8eb9620 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-13 00:02:55 -04:00
kingbri
6f03be9523 API: Split functions into their own files
Previously, generation function were bundled with the request function
causing the overall code structure and API to look ugly and unreadable.

Split these up and cleanup a lot of the methods that were previously
overlooked in the API itself.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-12 23:59:30 -04:00