Commit Graph

15 Commits

Author SHA1 Message Date
turboderp
79d581e1f5 OAI endpoints: More rework
- remove disconnect_task
- move disconnect logic to a per-request handler that wraps cleanup operation and directly polls the request state with throttling
- exclusively signal disconnect with CancelledError
- rework completions endpoint to follow same approach as chat completions, share some code
- refactor OAI endpoints a bit
- correct behavior for batched completion requests
- make sure logprobs work for completion and streaming completion requests
- more tests
2026-04-02 01:26:44 +02:00
turboderp
aa54098f26 Ruff: Format (line length) 2026-03-30 00:19:07 +02:00
turboderp
2a1503b283 Logging: Use debug level for Seq instead of verbose 2026-03-29 18:51:57 +02:00
turboderp
f3787de6a6 Ruff: Format 2026-03-27 21:47:24 +01:00
turboderp
83127ab4f8 Logging: Log messages via Seq wrapper 2026-03-27 21:38:47 +01:00
TerminalMan
c6f9806ec6 remove unused imports 2024-09-11 18:00:29 +01:00
Jake
362b8d5818 config is now backed by pydantic (WIP)
- add models for config options
- add function to regenerate config.yml
- replace references to config with pydantic compatible references
- remove unnecessary unwrap() statements

TODO:

- auto generate env vars
- auto generate argparse
- test loading a model
2024-09-05 18:04:56 +01:00
kingbri
93872b34d7 Config: Migrate to global class instead of dicts
The config categories can have defined separation, but preserve
the dynamic nature of adding new config options by making all the
internal class vars as dictionaries.

This was necessary since storing global callbacks stored a state
of the previous global_config var that wasn't populated.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-04 23:18:47 -04:00
kingbri
3826815edb API: Add request logging
Log all the parts of a request if the config flag is set. The logged
fields are all server side anyways, so nothing is being exposed to
clients.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-22 21:40:00 -04:00
kingbri
0eedc8ca14 API: Switch from request ID middleware to depends
Middleware runs on both the request and response. Therefore, streaming
responses had increased latency when processing tasks and sending
data to the client which resulted in erratic streaming behavior.

Use a depends to add request IDs since it only executes when the
request is run rather than expecting the response to be sent as well.

For the future, it would be best to think about limiting the time
between each tick of chunk data to be safe.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-22 12:19:46 -04:00
kingbri
e20a2d504b API: Fix pydantic validation errors on disconnect poll returns
Raise a 422 exception for the disconnect. This prevents pydantic
errors when returning a "response" which doesn't contain anything
in this case.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-15 14:41:49 -04:00
kingbri
6019c93637 Networking: Gate sending tracebacks over the API
It's possible that tracebacks can give too much info about a system
when sent over the API. Gate this under a flag to send them only
when debugging since this feature is still useful.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-14 10:30:11 -04:00
kingbri
c474076b22 Concurrency: Remove release_semaphore method
At any point for any request cancellation, the semaphore will be
decremented. This is an issue since an arbitrary request can desync
the semaphore, causing multiple tasks to be processed at once and
break generation.

Remove this from the networking handlers and therefore, remove the
release_semaphore function itself.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-19 10:42:26 -04:00
kingbri
ed7cd3cb59 Network: Fix socket check timeout
Make this a one second timeout to check if a socket is connected.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-22 21:33:41 -04:00
kingbri
6dfcbbd813 Common: Migrate request utils to networking
Helps organize the project better. Utils is meant to be for simple
functions like unwrap.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-21 23:21:57 -04:00