tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-03-15 00:07:28 +00:00

Author	SHA1	Message	Date
kingbri	d1706fb067	OAI: Remove double logging if request is cancelled Uvicorn can log in both the request disconnect handler and the CancelledError. However, these sometimes don't work and both need to be checked. But, don't log twice if one works. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 21:48:59 -04:00
kingbri	14dfaf600a	Args: Add request logging Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 21:41:42 -04:00
kingbri	3826815edb	API: Add request logging Log all the parts of a request if the config flag is set. The logged fields are all server side anyways, so nothing is being exposed to clients. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 21:40:00 -04:00
kingbri	522999ebb4	Config: Change from gen_logging to logging More accurately reflects the config.yml's sections. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 21:15:16 -04:00
kingbri	191600a150	Revert "Model: Skip empty token chunks" This reverts commit `21516bd7b5`. This skips EOS and implementing it the proper way seems more costly than necessary. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 18:34:00 -04:00
kingbri	15f891b277	Args: Update to latest config.yml Fix order of params to follow the same flow as config.yml Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 16:26:41 -04:00
kingbri	ad4d17bca2	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 12:24:34 -04:00
kingbri	21516bd7b5	Model: Skip empty token chunks This helps make the generation loop more efficient by skipping past chunks that aren't providing any tokens anyways. The offset isn't affected. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 12:23:49 -04:00
kingbri	0eedc8ca14	API: Switch from request ID middleware to depends Middleware runs on both the request and response. Therefore, streaming responses had increased latency when processing tasks and sending data to the client which resulted in erratic streaming behavior. Use a depends to add request IDs since it only executes when the request is run rather than expecting the response to be sent as well. For the future, it would be best to think about limiting the time between each tick of chunk data to be safe. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 12:19:46 -04:00
kingbri	cae94b920c	API: Add ability to use request IDs Identify which request is being processed to help users disambiguate which logs correspond to which request. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-21 21:01:05 -04:00
kingbri	38185a1ff4	Auth: Fix key check coalesce Prefer the auth-specific headers before the generic authorization header. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-19 10:08:57 -04:00
kingbri	e20a2d504b	API: Fix pydantic validation errors on disconnect poll returns Raise a 422 exception for the disconnect. This prevents pydantic errors when returning a "response" which doesn't contain anything in this case. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-15 14:41:49 -04:00
kingbri	933404c185	Model: Warn user if terminating jobs If skip_wait is true, it's best to let the user know that all jobs will be forcibly cancelled. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-15 11:34:16 -04:00
kingbri	9dae461142	Model: Attempt to recreate generator on a fatal error If a job causes the generator to error, tabby stops working until a relaunch. It's better to try establishing a system of redundancy and remake the generator in the event that it fails. May replace this with an exit signal for a fatal error instead, but not sure. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-15 01:09:49 -04:00
kingbri	6019c93637	Networking: Gate sending tracebacks over the API It's possible that tracebacks can give too much info about a system when sent over the API. Gate this under a flag to send them only when debugging since this feature is still useful. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-14 10:30:11 -04:00
Brian Dashore	ddad422d8b	Merge pull request #150 from AmgadHasan/fixDockerWorkflow Fix docker compose volume mount	2024-07-12 14:38:37 -04:00
Brian Dashore	ae4ba76fab	Merge pull request #147 from ai-and-i/stream_options Support stream_options argument to get usage info in streaming mode	2024-07-12 14:38:20 -04:00
kingbri	c1b61441f4	OAI: Fix usage chunk return Place the logic into their proper utility functions and cleanup the code with formatting. Also, OAI's docs specify that a [DONE] return is needed when everything is finished. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-12 14:37:20 -04:00
kingbri	5917515696	Dependencies: Update flash-attention v2.6.1 Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-12 10:09:49 -04:00
Amgad Hasan	2e5cf0ea3f	Fix docker compose volume mount	2024-07-12 13:23:58 +00:00
Volodymyr Kuznetsov	b149d3398d	OAI: support stream_options argument	2024-07-11 18:37:50 -07:00
kingbri	073e9fa6f0	Dependencies: Bump ExllamaV2 v0.1.7 Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-11 14:22:50 -04:00
kingbri	9fc3fc4c54	OAI: Amend comments Clarify what the user can and can't see. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-11 14:22:50 -04:00
kingbri	1f46a1130c	OAI: Restrict list permissions for API keys API keys are not allowed to view all the admin's models, templates, draft models, loras, etc. Basically anything that can be viewed on the filesystem outside of anything that's currently loaded is not allowed to be returned unless an admin key is present. This change helps preserve user privacy while not erroring out on list endpoints that the OAI spec requires. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-11 14:22:50 -04:00
kingbri	10890913b8	Auth: Revert x-admin-key allowance in API key check These kinda clash with each other. Use the correct header for the correct endpoint. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-11 14:22:50 -04:00
kingbri	dfb4c51d5f	OAI: Fix function idioms Make functions mean the same thing to avoid confusion. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-11 14:22:50 -04:00
kingbri	b9a58ff01b	Auth: Make key permission check work on Requests Pass a request and internally unwrap the headers. In addition, allow X-admin-key to get checked in an API key request. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-11 14:22:49 -04:00
Brian Dashore	ff15eed85d	Update README.md Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 21:26:11 +00:00
kingbri	5c293499bd	OAI: Reorder functions Reordering routes changes the order of appearance on documentation. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 15:27:08 -04:00
kingbri	521d21b9f2	OAI: Add return types for docs Adding return types allows for responses to get included in the autogenerated docs. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 15:23:41 -04:00
kingbri	62e495fc13	Model: Grammar: Fix lru_cache clear function It's cache_clear not clear_cache. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 15:10:15 -04:00
Brian Dashore	17438288c7	Merge pull request #146 from theroyallab/tokenizer_data_fix Tokenizer data fix	2024-07-08 15:08:29 -04:00
kingbri	c7ce97f119	Tree: Ruff lint Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 15:06:28 -04:00
kingbri	8a81fe2eb4	Actions: Add Github Pages deploy Deploys OpenAPI documentation to pages. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 15:04:27 -04:00
kingbri	6613e38436	Main: Make openapi export store locally This runs faster than always making a syscall to check if the env var is set. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 14:54:06 -04:00
kingbri	ae66e8f9ba	Ruff: Lint Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 13:44:12 -04:00
kingbri	b907421285	Main: Fix launch if EXPORT_OPENAPI is unset A default needs to be provided with getenv. Fix that with an empty string. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 13:41:44 -04:00
kingbri	a59e8ef9e7	Main: Make EXPORT_OPENAPI only work if true or 1 Use truthy values instead of checking if the variable is set. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 12:51:24 -04:00
kingbri	e58e197f0b	Ruff: Remove deprecated rule E999 Syntax error is removed since they'll always be shown when linting anyways. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 12:36:15 -04:00
kingbri	933268f7e2	API: Integrate OpenAPI export script Move OpenAPI export as an env var within the main function. This allows for easy export by running main. In addition, an env variable provides global and explicit state to disable conditional wheel imports (ex. Exl2 and torch) which caused errors at first. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 12:34:32 -04:00
turboderp	e97ad9cb27	RUFF	2024-07-08 03:51:14 +02:00
turboderp	8bbce3455c	RUFF	2024-07-08 03:49:26 +02:00
kingbri	5e82b7eb69	API: Add standalone method to fetch OpenAPI docs Generates and stores an export of the openapi.json file for use in static websites. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-07 21:35:52 -04:00
turboderp	4cf79c5ae1	Clear tokenizer_data cache when unloading model	2024-07-08 03:31:05 +02:00
turboderp	b7e7df1220	Move tokenizer_data cache to global scope	2024-07-08 02:54:49 +02:00
turboderp	4d0bb1ffc3	Cache creation tokenizer_data in LMFE	2024-07-08 00:51:59 +02:00
turboderp	bb8b02a60a	Wrap arch_compat_overrides in try block Quick fix until exllamav2 0.1.7 releases, since the function isn't defined for 0.1.6.	2024-07-07 07:54:05 +02:00
kingbri	773639ea89	Model: Fix flash-attn checks If flash attention is already turned off by exllamaV2 itself, don't try creating a paged generator. Also condense all the redundant logic into one if statement. Also check arch_compat_overrides to see if flash attention should be disabled for a model arch (ex. Gemma 2) Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-06 20:58:24 -04:00
kingbri	27d2d5f3d2	Config + Model: Allow for default fallbacks from config for model loads Previously, the parameters under the "model" block in config.yml only handled the loading of a model on startup. This meant that any subsequent API request required each parameter to be filled out or use a sane default (usually defaults to the model's config.json). However, there are cases where admins may want an argument from the config to apply if the parameter isn't provided in the request body. To help alleviate this, add a mechanism that works like sampler overrides where users can specify a flag that acts as a fallback. Therefore, this change both preserves the source of truth of what parameters the admin is loading and adds some convenience for users that want customizable defaults for their requests. This behavior may change in the future, but I think it solves the issue for now. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-06 17:50:58 -04:00
kingbri	d03752e31b	Issues: Fix template Correct Discord invite link. Signed-off-by: kingbri <bdashore3@proton.me>	2024-06-23 21:52:01 -04:00

1 2 3 4 5 ...

619 Commits