tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-03-15 00:07:28 +00:00

Author	SHA1	Message	Date
kingbri	5a23b9ebc9	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-22 01:28:30 -05:00
kingbri	bee26a2f2c	API: Auto-unload on a load request Automatically unload the existing model when calling /load. This was requested many times, and does make more sense in the long run. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-21 23:00:11 -05:00
kingbri	949248fb94	Config: Add experimental torch cuda malloc backend This option saves some VRAM, but does have the chance to error out. Add this in the experimental config section. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-14 21:45:56 -05:00
kingbri	c02fe4d1db	API: Fix response creation Change chat completion and text completion responses to be more flexible. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-08 21:26:53 -05:00
kingbri	0af6a38af3	Model: Add logprobs support Returns token offsets, selected tokens, probabilities of tokens post-sampling, and normalized probability of selecting a token pre-sampling (for efficiency purposes). Only for text completions. Chat completions in a later commit. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-08 21:26:53 -05:00
kingbri	284f20263f	API: Clean up tokenizing endpoint Split the get tokens function into separate wrapper encode and decode functions for overall code cleanliness. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-08 21:26:53 -05:00
kingbri	58590a6c57	Config: Add option to force streaming off Many APIs automatically ask for request streaming without giving the user the option to turn it off. Therefore, give the user more freedom by giving a server-side kill switch. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-07 21:09:59 -05:00
kingbri	1919bf7705	Launch: Make exllamav2 requirement more friendly Add the ability to use an unsafe config flag if needed and migrate the exl2 check to a different file within the exl2 backend code. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-02 23:36:17 -05:00
kingbri	2ea063cea9	Tree: Require exllamav2 version for startup Exllamav2 is currently supported on all GPUs and versions. Therefore, it should be expected that users use the latest version of exllamav2 to get the latest features. Doing this helps reduce checks that don't really serve any purpose. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-02 23:36:17 -05:00
kingbri	d3781920b3	OAI: Split up utility functions Just like types, put utility functions in their own separate module based on the route. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-02 23:36:17 -05:00
kingbri	b14c5443fd	API: Add sampler override switching Allow users to switch the currently overriden samplers via the API so a restart isn't required to switch the overrides. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-25 00:15:40 -05:00
kingbri	de0ba7214c	API: Add template switching and unload endpoints Templates can be switched and unloaded without reloading the entire model. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-25 00:15:40 -05:00
kingbri	6c30f24c83	Tree: Unify sampler parameters and add override support Unify API sampler params into a superclass which should make them easier to manage and inherit generic functions from. Not all frontends expose all sampling parameters due to connections with OAI (that handles sampling themselves with the exception of a few sliders). Add the ability for the user to customize fallback parameters from server-side. In addition, parameters can be forced to a certain value server-side in case the repo automatically sets other sampler values in the background that the user doesn't want. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-25 00:15:40 -05:00
kingbri	78f920eeda	Tree: Refactor code organization Move common functions into their own folder and refactor the backends to use their own folder as well. Also cleanup imports and alphabetize import statments themselves. Finally, move colab and docker into their own folders as well. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-25 00:15:40 -05:00
kingbri	902e841c39	Main: Add logging for API routes Helps users get started with accessing the docs. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-10 23:50:11 -05:00
kingbri	c1642076c2	API: Switch unload method to POST GET and POST can be used interchangeably in this case, but adhere to the HTTP spec. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-04 21:11:36 -05:00
kingbri	451042aadf	Main: Don't load if model_name/loras is blank Previously, if model_name was commented out, a load would not occur. Add the case if model_name or loras is blank which returns None when parsing the YAML. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-02 13:56:25 -05:00
kingbri	6b04463051	API: Fix CFG reporting THe model endpoint wasn't reporting if CFG is on. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-02 13:54:16 -05:00
kingbri	bb7a8e4614	Config: Add override argparser Add an argparser that casts over to dictionaries of subgroups to integrate with the config. This argparser doesn't contain everything in the config due to complexity issues with CLI args, but will eventually progress to parity. In addition, it's used to override the config.yml rather than replace it. A config arg is also provided if the user wants to fully override the config yaml with another file path. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-01 14:27:12 -05:00
kingbri	79a57588d5	API: Add template list endpoint Fetches all template names that a user has in the templates directory for chat completions. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-29 22:58:55 -05:00
kingbri	dce8c74edc	API: Add clarification and cleanup autodocs It's possible to override parts of the example JSON to give proper examples of values. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-29 10:28:06 -05:00
kingbri	3622710582	API: Fix num_experts_per_token reporting This wasn't linked to the model config. This value can be 1 if a MoE model isn't loaded. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-28 00:31:14 -05:00
kingbri	c5bbfd97b2	Entrypoint: Load loras after model Prevents an error if the model isn't loaded on startup. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-27 23:55:02 -05:00
kingbri	ac0d6f8869	Tree: Format and cleanup start Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-27 01:17:31 -05:00
kingbri	a71b96a20c	Main: Switch to entrypoint Allows for other modules to access the startup function. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-27 00:34:50 -05:00
kingbri	09ae71aa91	OAI: Add finish to completions OAI spec requires [DONE] to be sent over SSE to signal that a generation is completed. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-25 11:25:38 -05:00
kingbri	703a114f63	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-23 23:03:28 -05:00
kingbri	c9126c3145	Config: Isolate to a separate file Reduce dependency of globals in main to simplify code a bit. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-23 23:02:37 -05:00
kingbri	0d2e726e82	Main: Fix import formatting Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-23 21:33:15 -05:00
kingbri	3461f8294f	Logging: Clarify preferences Preferences are preferences, not a config. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-23 21:08:10 -05:00
AlpinDale	6a5bbd217c	feat: logging (#39 ) * add logging * simplify the logger * formatting * final touches * fix format * Model: Add log to metrics Signed-off-by: kingbri <bdashore3@proton.me> --------- Authored-by: AlpinDale <52078762+AlpinDale@users.noreply.github.com>	2023-12-23 04:33:31 +00:00
kingbri	71f6a586f1	Templates: Add error handling for template errors Similar to the transformers library, add an error handler when an exception is fired. This relays the error to the user. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-22 11:59:47 -05:00
AlpinDale	fa47f51f85	feat: workflows for formatting/linting (#35 ) * add github workflows for pylint and yapf * yapf * docstrings for auth * fix auth.py * fix generators.py * fix gen_logging.py * fix main.py * fix model.py * fix templating.py * fix utils.py * update formatting.sh to include subdirs for pylint * fix model_test.py * fix wheel_test.py * rename utils to utils_oai * fix OAI/utils_oai.py * fix completion.py * fix token.py * fix lora.py * fix common.py * add pylintrc and fix model.py * finish up pylint * fix attribute error * main.py formatting * add formatting batch script * Main: Remove unnecessary global Linter suggestion. Signed-off-by: kingbri <bdashore3@proton.me> * switch to ruff * Formatting + Linting: Add ruff.toml Signed-off-by: kingbri <bdashore3@proton.me> * Formatting + Linting: Switch scripts to use ruff Also remove the file and recent file change functions from both scripts. Signed-off-by: kingbri <bdashore3@proton.me> * Tree: Format and lint Signed-off-by: kingbri <bdashore3@proton.me> * Scripts + Workflows: Format Signed-off-by: kingbri <bdashore3@proton.me> * Tree: Remove pylint flags We use ruff now Signed-off-by: kingbri <bdashore3@proton.me> * Tree: Format Signed-off-by: kingbri <bdashore3@proton.me> * Formatting: Line length is 88 Use the same value as Black. Signed-off-by: kingbri <bdashore3@proton.me> * Tree: Format Update to new line length rules. Signed-off-by: kingbri <bdashore3@proton.me> --------- Authored-by: AlpinDale <52078762+AlpinDale@users.noreply.github.com> Co-authored-by: kingbri <bdashore3@proton.me>	2023-12-22 16:20:35 +00:00
kingbri	a14abfe21c	Templates: Support bos_token and eos_token fields These are commonly seen in huggingface provided chat templates and aren't that difficult to add in. For feature parity, honor the add_bos_token and ban_eos_token parameters when constructing the prompt. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-22 10:33:11 -05:00
kingbri	8fa764bfbe	Auth: Add option to disable authentication This creates a massive security hole, but it's gated behind a flag for users who only use localhost. A warning will pop up when users disable authentication. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-21 23:40:16 -05:00
kingbri	99a798e117	API: Add auth enforcement to draft list This didn't have an API key gate. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-21 23:14:04 -05:00
kingbri	1a8afcb6ad	Generator: Fix semaphore scheduling Non-streaming tasks were not regulated by the semaphore, causing these tasks to interfere with streaming generations. Add helper functions to take in both sync and async functions for callbacks and sequential blocking with the semaphore. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-21 21:39:45 -05:00
kingbri	c9e43e51aa	API: Add route for draft model list Does the same thing as model list except with draft models. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 23:45:53 -05:00
kingbri	da69ad8cd3	Requirements: Pin versions for some dependencies Pydantic and Jinja2 need pinned versions. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 21:48:04 -05:00
kingbri	1fd38c61de	API: Remove model check dependency for lora list This isn't needed for listing stuff. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 21:35:29 -05:00
kingbri	de9a19b5d3	Templating: Add generation prompt appending Append generation prompts if given the flag on an OAI chat completion request. This appends the "assistant" message to the instruct prompt. Defaults to true since this is intended behavior. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	51ca1ff396	Tree: Switch to Pydantic 2 Pydantic 2 has more modern methods and stability compared to Pydantic 1 Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	f631dd6ff7	Templates: Switch to Jinja2 Jinja2 is a lightweight template parser that's used in Transformers for parsing chat completions. It's much more efficient than Fastchat and can be imported as part of requirements. Also allows for unblocking Pydantic's version. Users now have to provide their own template if needed. A separate repo may be usable for common prompt template storage. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	1a331afe3a	OAI: Add cache_mode parameter to model Mistakenly forgot that the user can choose what cache mode to use when loading a model. Also add when fetching model info. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-16 02:47:50 -05:00
kingbri	eb8ccb9783	Tree: Fix linter issues Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-12 23:58:19 -05:00
kingbri	083df7d585	Tree: Add generation logging support Generations can be logged in the console along with sampling parameters if the user enables it in config. Metrics are always logged at the end of each prompt. In addition, the model endpoint tells the user if they're being logged or not for transparancy purposes. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-12 23:43:35 -05:00
kingbri	db87efde4a	OAI: Add ability to specify fastchat prompt template Sometimes fastchat may not be able to detect the prompt template from the model path. Therefore, add the ability to set it in config.yml or via the request object itself. Also send the provided prompt template on model info request. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-10 15:43:58 -05:00
kingbri	9f195af5ad	Main: Fix function calls Some function names were declared twice. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-10 13:28:21 -05:00
kingbri	fd9f3eac87	Model: Add params to current model endpoint Grabs the current model rope params, max seq len, and the draft model if applicable. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-10 00:40:56 -05:00
kingbri	0f4290f05c	Model: Format Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-09 22:48:42 -05:00

1 2

91 Commits