Commit Graph

99 Commits

Author SHA1 Message Date
kingbri
fb903ecddf OAI: Relax role requirement for chat completion message lists
Make it so any message role can be parsed from a list. Not really
sure why this is the case because system and assistant shouldn't be
sending data other than text, but it also doesn't make much sense
to be extremely strict with roles either.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-22 22:42:06 -04:00
kingbri
daa57ceada API: Upgrade config declarations
Some were using the old unwrap methods.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-17 00:42:39 -04:00
TerminalMan
c6f9806ec6 remove unused imports 2024-09-11 18:00:29 +01:00
TerminalMan
e8fcecd56a Merge remote-tracking branch 'upstream/main' into HEAD 2024-09-11 15:57:18 +01:00
kingbri
e00eb09ef3 OAI: Add cancellation with inline load
When the request is cancelled, cancel the load task. In addition,
when checking if a model container exists, also check if the model
is fully loaded.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-11 00:08:55 -04:00
Cohee
63476041d1 Properly specify config value in the error message 2024-09-08 22:02:49 +03:00
kingbri
2f45e978c5 API: Fix merge overwrite
The completions utils did not take the new imports.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-05 18:04:53 -04:00
Brian Dashore
ec7f64d530 Merge pull request #185 from SecretiveShell/refactor-config-loading
Refactor config loading
2024-09-05 18:00:32 -04:00
Jake
362b8d5818 config is now backed by pydantic (WIP)
- add models for config options
- add function to regenerate config.yml
- replace references to config with pydantic compatible references
- remove unnecessary unwrap() statements

TODO:

- auto generate env vars
- auto generate argparse
- test loading a model
2024-09-05 18:04:56 +01:00
kingbri
93872b34d7 Config: Migrate to global class instead of dicts
The config categories can have defined separation, but preserve
the dynamic nature of adding new config options by making all the
internal class vars as dictionaries.

This was necessary since storing global callbacks stored a state
of the previous global_config var that wasn't populated.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-04 23:18:47 -04:00
kingbri
9c10789ca1 API: Error on invalid key permissions and cleanup format
If a user requesting a model change isn't admin, error.

Better to place the load function before the generate functions.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-04 21:44:14 -04:00
kingbri
21f14d4318 API: Update inline load
- Add a config flag
- Migrate support to /v1/completions
- Unify the load function

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-03 23:37:28 -04:00
kingbri
dd30d6592a Merge branch 'main' of https://github.com/theroyallab/tabbyapi into inline 2024-09-03 18:03:17 -04:00
kingbri
dd55b99af5 Model: Store directory paths
Storing a pathlib type makes it easier to manipulate the model
directory path in the long run without constantly fetching it
from the config.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-31 22:59:56 -04:00
Ben Gitter
045bc98333 Remove rouge print statements within chat_completion.py (#174)
* rouge prompt print

* remove print pt2

* Print Removal Final
2024-08-23 21:28:37 -04:00
kingbri
a51acb9db4 Templates: Switch to async jinja engine
This prevents any possible blocking of the event loop due to template
rendering.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-17 12:03:41 -04:00
kingbri
b4752c1e62 Templates: Revert to load metadata on runtime
Metadata is generated via a template's module. This requires a single
iteration through the template. If a template tries to access a passed
variable that doesn't exist, it will error.

Therefore, generate the metadata at runtime to prevent these errors
from happening. To optimize further, cache the metadata after the
first generation to prevent the expensive call of making a template
module.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-17 11:44:42 -04:00
Ben Gitter
70b9fc95de [WIP] OpenAI Tools Support/Function calling (#154)
* returning stop str if exists from gen

* added chat template for firefunctionv2

* pulling tool vars from template

* adding parsing for tool inputs/outputs

* passing tool data from endpoint to chat template, adding tool_start to the stop list

* loosened typing on the response tool call, leaning more on the user supplying a quality schema if they want a particular format

* non streaming generation prototype

* cleaning template

* Continued work with type, ingestion into template, and chat template for fire func

* Correction - streaming toolcall comes back as delta obj not inside chatcomprespchoice per chat_completion_chunk.py inside OAI lib.

* Ruff Formating

* Moved stop string and tool updates out of prompt creation func

Updated tool pydantic to match OAI

Support for streaming

Updated generate tool calls to use flag within chat_template and insert tool reminder

* Llama 3.1 chat templates

Updated fire func template

* renamed llama3.1 to chatml_with_headers..

* update name of template

* Support for calling a tool start token rather than the string.

Simplified tool_params

Warning when gen_settings are being overidden becuase user set temp to 0

Corrected schema and tools to correct types for function args. Str for some reason

* draft groq tool use model template

* changed headers to vars for readablity (but mostly because some models are weird about newlines after headers, so this is an easier way to change globally)

* Clean up comments and code in chat comp

* Post processed tool call to meet OAI spec rather than forcing model to write json in a string in the middle of the call.

* changes example back to args as json rather than string of json

* Standardize chat templates to each other

* cleaning/rewording

* stop elements can also be ints (tokens)

* Cleaning/formatting

* added special tokens for tools and tool_response as specified in description

* Cleaning

* removing aux templates - going to live in llm-promp-templates repo instead

* Tree: Format

Signed-off-by: kingbri <bdashore3@proton.me>

* Chat Completions: Don't include internal tool variables in OpenAPI

Use SkipJsonSchema to supress inclusion with the OpenAPI JSON. The
location of these variables may need to be changed in the future.

Signed-off-by: kingbri <bdashore3@proton.me>

* Templates: Deserialize metadata on template load

Since we're only looking for specific template variables that are
static in the template, it makes more sense to render when the template
is initialized.

Signed-off-by: kingbri <bdashore3@proton.me>

* Tools: Fix comments

Adhere to the format style of comments in the rest of the project.

Signed-off-by: kingbri <bdashore3@proton.me>

---------

Co-authored-by: Ben Gitter <gitterbd@gmail.com>
Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-17 00:16:25 -04:00
Bartowski
c75e911f07 Merge branch 'main' into main 2024-08-14 16:16:15 -04:00
Brian Dashore
1bf062559d Merge pull request #158 from AlpinDale/embeddings
feat: add embeddings support via Infinity-emb
2024-07-31 20:33:12 -04:00
kingbri
f13d0fb8b3 Embeddings: Add model load checks
Same as the normal model container.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 11:17:36 -04:00
kingbri
fbf1455db1 Embeddings: Migrate and organize Infinity
Use Infinity as a separate backend and handle the model within the
common module. This separates out the embeddings model from the endpoint
which allows for model loading/unloading in core.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 11:00:23 -04:00
kingbri
ac1afcc588 Embeddings: Use response classes instead of dicts
Follows the existing code style.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-29 14:15:40 -04:00
kingbri
3f21d9ef96 Embeddings: Switch to Infinity
Infinity-emb is an async batching engine for embeddings. This is
preferable to sentence-transformers since it handles scalable usecases
without the need for external thread intervention.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-29 13:42:03 -04:00
kingbri
c9a5d2c363 OAI: Refactor embeddings
Move files and rewrite routes to adhere to Tabby's code style.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-28 14:10:51 -04:00
kingbri
2773517a16 API: Add setup function to routers
This helps prepare the router before exposing it to the parent app.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 22:24:33 -04:00
Brian Dashore
6365427d38 Merge pull request #155 from Vhallo/main
Simple Typo Fix
2024-07-26 21:35:50 -04:00
kingbri
884b6f5ecd API: Add log options for initialization
Make each API log their respective URLs to help inform users.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 21:32:05 -04:00
AlpinDale
5adfab1cbd ruff: formatting 2024-07-26 02:53:14 +00:00
AlpinDale
f20cd330ef feat: add embeddings support via sentence-transformers 2024-07-26 02:45:07 +00:00
Vhallo
b2064bbfb4 Typo fix in completion.py 2024-07-23 23:49:43 +02:00
Vhallo
88e4b108b4 Typo fix in chat_completion.py 2024-07-23 23:48:50 +02:00
kingbri
9ad69e8ab6 API: Migrate universal routes to core
Place OAI specific routes in the appropriate folder. This is in
preperation for adding new API servers that can be optionally enabled.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-23 14:08:48 -04:00
kingbri
64c2cc85c9 OAI: Migrate model depends into proper file
Use amongst multiple routers.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-23 13:59:56 -04:00
kingbri
d1706fb067 OAI: Remove double logging if request is cancelled
Uvicorn can log in both the request disconnect handler and the
CancelledError. However, these sometimes don't work and both
need to be checked. But, don't log twice if one works.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-22 21:48:59 -04:00
kingbri
cae94b920c API: Add ability to use request IDs
Identify which request is being processed to help users disambiguate
which logs correspond to which request.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-21 21:01:05 -04:00
kingbri
38185a1ff4 Auth: Fix key check coalesce
Prefer the auth-specific headers before the generic authorization
header.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-19 10:08:57 -04:00
kingbri
c1b61441f4 OAI: Fix usage chunk return
Place the logic into their proper utility functions and cleanup
the code with formatting.

Also, OAI's docs specify that a [DONE] return is needed when everything
is finished.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-12 14:37:20 -04:00
Volodymyr Kuznetsov
b149d3398d OAI: support stream_options argument 2024-07-11 18:37:50 -07:00
kingbri
9fc3fc4c54 OAI: Amend comments
Clarify what the user can and can't see.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-11 14:22:50 -04:00
kingbri
1f46a1130c OAI: Restrict list permissions for API keys
API keys are not allowed to view all the admin's models, templates,
draft models, loras, etc. Basically anything that can be viewed
on the filesystem outside of anything that's currently loaded is
not allowed to be returned unless an admin key is present.

This change helps preserve user privacy while not erroring out on
list endpoints that the OAI spec requires.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-11 14:22:50 -04:00
kingbri
dfb4c51d5f OAI: Fix function idioms
Make functions mean the same thing to avoid confusion.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-11 14:22:50 -04:00
kingbri
b9a58ff01b Auth: Make key permission check work on Requests
Pass a request and internally unwrap the headers. In addition, allow
X-admin-key to get checked in an API key request.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-11 14:22:49 -04:00
Colin Kealty
279e900ea5 Add on the fly model loading to requests 2024-07-11 10:52:10 -04:00
kingbri
5c293499bd OAI: Reorder functions
Reordering routes changes the order of appearance on documentation.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-08 15:27:08 -04:00
kingbri
521d21b9f2 OAI: Add return types for docs
Adding return types allows for responses to get included in the
autogenerated docs.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-08 15:23:41 -04:00
kingbri
27d2d5f3d2 Config + Model: Allow for default fallbacks from config for model loads
Previously, the parameters under the "model" block in config.yml only
handled the loading of a model on startup. This meant that any subsequent
API request required each parameter to be filled out or use a sane default
(usually defaults to the model's config.json).

However, there are cases where admins may want an argument from the
config to apply if the parameter isn't provided in the request body.
To help alleviate this, add a mechanism that works like sampler overrides
where users can specify a flag that acts as a fallback.

Therefore, this change both preserves the source of truth of what
parameters the admin is loading and adds some convenience for users
that want customizable defaults for their requests.

This behavior may change in the future, but I think it solves the
issue for now.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-06 17:50:58 -04:00
DocShotgun
156b74f3f0 Revision to paged attention checks (#133)
* Model: Clean up paged attention checks

* Model: Move cache_size checks after paged attn checks
Cache size is only relevant in paged mode

* Model: Fix no_flash_attention

* Model: Remove no_flash_attention
Ability to use flash attention is auto-detected, so this flag is unneeded. Uninstall flash attention to disable it on supported hardware.
2024-06-09 17:28:11 +02:00
DocShotgun
55d979b7a5 Update dependencies, support Python 3.12, update for exl2 0.1.5 (#134)
* Dependencies: Add wheels for Python 3.12

* Model: Switch fp8 cache to Q8 cache

* Model: Add ability to set draft model cache mode

* Dependencies: Bump exllamav2 to 0.1.5

* Model: Support Q6 cache

* Config: Add Q6 cache and draft_cache_mode to config sample
2024-06-09 17:27:39 +02:00
Orion
6cc3bd9752 feat: list support in message.content (#122) 2024-06-03 19:57:15 +02:00