tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-03-15 00:07:28 +00:00

Author	SHA1	Message	Date
kingbri	933404c185	Model: Warn user if terminating jobs If skip_wait is true, it's best to let the user know that all jobs will be forcibly cancelled. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-15 11:34:16 -04:00
kingbri	9dae461142	Model: Attempt to recreate generator on a fatal error If a job causes the generator to error, tabby stops working until a relaunch. It's better to try establishing a system of redundancy and remake the generator in the event that it fails. May replace this with an exit signal for a fatal error instead, but not sure. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-15 01:09:49 -04:00
kingbri	6019c93637	Networking: Gate sending tracebacks over the API It's possible that tracebacks can give too much info about a system when sent over the API. Gate this under a flag to send them only when debugging since this feature is still useful. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-14 10:30:11 -04:00
Brian Dashore	ddad422d8b	Merge pull request #150 from AmgadHasan/fixDockerWorkflow Fix docker compose volume mount	2024-07-12 14:38:37 -04:00
Brian Dashore	ae4ba76fab	Merge pull request #147 from ai-and-i/stream_options Support stream_options argument to get usage info in streaming mode	2024-07-12 14:38:20 -04:00
kingbri	c1b61441f4	OAI: Fix usage chunk return Place the logic into their proper utility functions and cleanup the code with formatting. Also, OAI's docs specify that a [DONE] return is needed when everything is finished. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-12 14:37:20 -04:00
kingbri	5917515696	Dependencies: Update flash-attention v2.6.1 Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-12 10:09:49 -04:00
Amgad Hasan	2e5cf0ea3f	Fix docker compose volume mount	2024-07-12 13:23:58 +00:00
Volodymyr Kuznetsov	b149d3398d	OAI: support stream_options argument	2024-07-11 18:37:50 -07:00
kingbri	073e9fa6f0	Dependencies: Bump ExllamaV2 v0.1.7 Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-11 14:22:50 -04:00
kingbri	9fc3fc4c54	OAI: Amend comments Clarify what the user can and can't see. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-11 14:22:50 -04:00
kingbri	1f46a1130c	OAI: Restrict list permissions for API keys API keys are not allowed to view all the admin's models, templates, draft models, loras, etc. Basically anything that can be viewed on the filesystem outside of anything that's currently loaded is not allowed to be returned unless an admin key is present. This change helps preserve user privacy while not erroring out on list endpoints that the OAI spec requires. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-11 14:22:50 -04:00
kingbri	10890913b8	Auth: Revert x-admin-key allowance in API key check These kinda clash with each other. Use the correct header for the correct endpoint. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-11 14:22:50 -04:00
kingbri	dfb4c51d5f	OAI: Fix function idioms Make functions mean the same thing to avoid confusion. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-11 14:22:50 -04:00
kingbri	b9a58ff01b	Auth: Make key permission check work on Requests Pass a request and internally unwrap the headers. In addition, allow X-admin-key to get checked in an API key request. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-11 14:22:49 -04:00
Brian Dashore	ff15eed85d	Update README.md Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 21:26:11 +00:00
kingbri	5c293499bd	OAI: Reorder functions Reordering routes changes the order of appearance on documentation. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 15:27:08 -04:00
kingbri	521d21b9f2	OAI: Add return types for docs Adding return types allows for responses to get included in the autogenerated docs. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 15:23:41 -04:00
kingbri	62e495fc13	Model: Grammar: Fix lru_cache clear function It's cache_clear not clear_cache. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 15:10:15 -04:00
Brian Dashore	17438288c7	Merge pull request #146 from theroyallab/tokenizer_data_fix Tokenizer data fix	2024-07-08 15:08:29 -04:00
kingbri	c7ce97f119	Tree: Ruff lint Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 15:06:28 -04:00
kingbri	8a81fe2eb4	Actions: Add Github Pages deploy Deploys OpenAPI documentation to pages. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 15:04:27 -04:00
kingbri	6613e38436	Main: Make openapi export store locally This runs faster than always making a syscall to check if the env var is set. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 14:54:06 -04:00
kingbri	ae66e8f9ba	Ruff: Lint Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 13:44:12 -04:00
kingbri	b907421285	Main: Fix launch if EXPORT_OPENAPI is unset A default needs to be provided with getenv. Fix that with an empty string. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 13:41:44 -04:00
kingbri	a59e8ef9e7	Main: Make EXPORT_OPENAPI only work if true or 1 Use truthy values instead of checking if the variable is set. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 12:51:24 -04:00
kingbri	e58e197f0b	Ruff: Remove deprecated rule E999 Syntax error is removed since they'll always be shown when linting anyways. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 12:36:15 -04:00
kingbri	933268f7e2	API: Integrate OpenAPI export script Move OpenAPI export as an env var within the main function. This allows for easy export by running main. In addition, an env variable provides global and explicit state to disable conditional wheel imports (ex. Exl2 and torch) which caused errors at first. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 12:34:32 -04:00
turboderp	e97ad9cb27	RUFF	2024-07-08 03:51:14 +02:00
turboderp	8bbce3455c	RUFF	2024-07-08 03:49:26 +02:00
kingbri	5e82b7eb69	API: Add standalone method to fetch OpenAPI docs Generates and stores an export of the openapi.json file for use in static websites. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-07 21:35:52 -04:00
turboderp	4cf79c5ae1	Clear tokenizer_data cache when unloading model	2024-07-08 03:31:05 +02:00
turboderp	b7e7df1220	Move tokenizer_data cache to global scope	2024-07-08 02:54:49 +02:00
turboderp	4d0bb1ffc3	Cache creation tokenizer_data in LMFE	2024-07-08 00:51:59 +02:00
turboderp	bb8b02a60a	Wrap arch_compat_overrides in try block Quick fix until exllamav2 0.1.7 releases, since the function isn't defined for 0.1.6.	2024-07-07 07:54:05 +02:00
kingbri	773639ea89	Model: Fix flash-attn checks If flash attention is already turned off by exllamaV2 itself, don't try creating a paged generator. Also condense all the redundant logic into one if statement. Also check arch_compat_overrides to see if flash attention should be disabled for a model arch (ex. Gemma 2) Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-06 20:58:24 -04:00
kingbri	27d2d5f3d2	Config + Model: Allow for default fallbacks from config for model loads Previously, the parameters under the "model" block in config.yml only handled the loading of a model on startup. This meant that any subsequent API request required each parameter to be filled out or use a sane default (usually defaults to the model's config.json). However, there are cases where admins may want an argument from the config to apply if the parameter isn't provided in the request body. To help alleviate this, add a mechanism that works like sampler overrides where users can specify a flag that acts as a fallback. Therefore, this change both preserves the source of truth of what parameters the admin is loading and adds some convenience for users that want customizable defaults for their requests. This behavior may change in the future, but I think it solves the issue for now. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-06 17:50:58 -04:00
kingbri	d03752e31b	Issues: Fix template Correct Discord invite link. Signed-off-by: kingbri <bdashore3@proton.me>	2024-06-23 21:52:01 -04:00
kingbri	45fae89af6	Update README Signed-off-by: kingbri <bdashore3@proton.me>	2024-06-23 21:50:17 -04:00
kingbri	c5ea2abe24	Dependencies: Update ExllamaV2 v0.1.6 Signed-off-by: kingbri <bdashore3@proton.me>	2024-06-23 21:45:04 -04:00
kingbri	d85b526644	Dependencies: Pin numpy v2.x breaks many upstream dependencies (torch). Pin until repos are fixed. Signed-off-by: kingbri <bdashore3@proton.me>	2024-06-23 21:40:09 -04:00
DocShotgun	107436f601	Dependencies: Fix AMD triton (#139 )	2024-06-18 15:19:27 +02:00
Brian Dashore	06ee610a97	Update README Signed-off-by: kingbri <bdashore3@proton.me>	2024-06-17 03:56:47 +00:00
kingbri	c575105e41	ExllamaV2: Cleanup log placements Move the large import errors into the check functions themselves. This helps reduce the difficulty in interpreting where errors are coming from. Signed-off-by: kingbri <bdashore3@proton.me>	2024-06-16 00:16:03 -04:00
Glenn Maynard	8da7644571	Fix exception unloading models. (#138 ) self.generator is None if a model load fails or is cancelled.	2024-06-15 23:44:29 +02:00
DocShotgun	85387d97ad	Fix disabling flash attention in exl2 config (#136 ) * Model: Fix disabling flash attention in exl2 config * Model: Pass no_flash_attn to draft config * Model: Force torch flash SDP off in compatibility mode	2024-06-12 20:00:46 +02:00
DocShotgun	156b74f3f0	Revision to paged attention checks (#133 ) * Model: Clean up paged attention checks * Model: Move cache_size checks after paged attn checks Cache size is only relevant in paged mode * Model: Fix no_flash_attention * Model: Remove no_flash_attention Ability to use flash attention is auto-detected, so this flag is unneeded. Uninstall flash attention to disable it on supported hardware.	2024-06-09 17:28:11 +02:00
DocShotgun	55d979b7a5	Update dependencies, support Python 3.12, update for exl2 0.1.5 (#134 ) * Dependencies: Add wheels for Python 3.12 * Model: Switch fp8 cache to Q8 cache * Model: Add ability to set draft model cache mode * Dependencies: Bump exllamav2 to 0.1.5 * Model: Support Q6 cache * Config: Add Q6 cache and draft_cache_mode to config sample	2024-06-09 17:27:39 +02:00
DocShotgun	dcd9428325	Model: Warn if cache size is too small for CFG (#132 )	2024-06-05 19:40:14 +02:00
DocShotgun	e391d84e40	More extensive checks for paged mode support (#121 ) * Model: More extensive checks for paged attention Previously, TabbyAPI only checked for whether the user's hardware supports flash attention before deciding whether to enabled paged mode. This adds checks for whether no_flash_attention is set, whether flash-attn is installed, and whether the installed version supports paged attention. * Tree: Format * Tree: Lint * Model: Check GPU architecture first Check GPU arch prior to checking whether flash attention 2 is installed	2024-06-05 09:33:21 +02:00

1 2 3 4 5 ...

557 Commits