tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-03-14 15:57:27 +00:00

Author	SHA1	Message	Date
kingbri	21712578cf	API: Add allowed_tokens support This is the opposite of banned tokens. Exllama specific implementation of #181. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-29 21:44:42 -04:00
kingbri	10d9419f90	Model: Add BOS token to prompt logs If add_bos_token is enabled, the BOS token gets appended to the logged prompt if logging is enabled. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-29 21:15:09 -04:00
kingbri	96fce34253	Dependencies: Update ExllamaV2 v0.2.0 Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-28 18:34:00 -04:00
kingbri	a00d972054	Server: Remove unused comments Leftovers from the new API server log system. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-27 21:45:51 -04:00
kingbri	4958c06813	Model: Remove and format comments The comment in __init__ was outdated and all the kwargs are the config options anyways. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-27 21:43:40 -04:00
TerminalMan	80198ca056	API: Add /v1/health endpoint (#178 ) * Add healthcheck - localhost only /healthcheck endpoint - cURL healthcheck in docker compose file * Update Healthcheck Response - change endpoint to /health - remove localhost restriction - add docstring * move healthcheck definition to top of the file - make the healthcheck show up first in the openAPI spec * Tree: Format	2024-08-27 21:37:41 -04:00
Amgad Hasan	872eeed581	Build and push docker image (#171 ) * Create docker-image.yml * Update docker-image.yml	2024-08-26 16:18:10 -04:00
Ben Gitter	045bc98333	Remove rouge print statements within chat_completion.py (#174 ) * rouge prompt print * remove print pt2 * Print Removal Final	2024-08-23 21:28:37 -04:00
turboderp	fe3253f3a9	Model: Account for tokenizer lazy init	2024-08-23 23:51:53 +02:00
turboderp	a676c4bf38	Model: Formatting	2024-08-23 11:15:30 +02:00
turboderp	a3733caeda	Model: Fix draft model cache initialization	2024-08-23 11:08:49 +02:00
kingbri	364032e39e	Config: Remove developement flag from tensor parallel Exists in stable ExllamaV2 version. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-22 14:15:19 -04:00
kingbri	565b0300d6	Dependencies: Update Exllamav2 v0.1.9 Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-22 14:15:19 -04:00
kingbri	078fbf1080	Model: Add quantized cache support for tensor parallel Newer versions of exl2 v1.9-dev have quantized cache implemented. Add those APIs. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-22 14:15:19 -04:00
kingbri	871c89063d	Model: Add Tensor Parallel support Use the tensor parallel loader when the flag is enabled. The new loader has its own autosplit implementation, so gpu_split_auto isn't valid here. Also make it easier to determine which cache type to use rather than multiple if/else statements. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-22 14:15:19 -04:00
kingbri	5002617eac	Model: Split cache creation into a common function Unifies the switch statement across both draft and model caches. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-22 14:15:19 -04:00
kingbri	ecaddec48a	Docker-compose: Add models to bind mounts At least one bind mount is required in the volumes YAML block otherwise the docker build fails. Models should be fine to default since it always exists. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-19 22:07:53 -04:00
Amgad Hasan	dae394050e	Improve docker deployment configuration (#163 )	2024-08-18 15:19:18 -04:00
kingbri	a51acb9db4	Templates: Switch to async jinja engine This prevents any possible blocking of the event loop due to template rendering. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 12:03:41 -04:00
kingbri	b4752c1e62	Templates: Revert to load metadata on runtime Metadata is generated via a template's module. This requires a single iteration through the template. If a template tries to access a passed variable that doesn't exist, it will error. Therefore, generate the metadata at runtime to prevent these errors from happening. To optimize further, cache the metadata after the first generation to prevent the expensive call of making a template module. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 11:44:42 -04:00
kingbri	617ac12150	Update README Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 00:35:42 -04:00
Ben Gitter	70b9fc95de	[WIP] OpenAI Tools Support/Function calling (#154 ) * returning stop str if exists from gen * added chat template for firefunctionv2 * pulling tool vars from template * adding parsing for tool inputs/outputs * passing tool data from endpoint to chat template, adding tool_start to the stop list * loosened typing on the response tool call, leaning more on the user supplying a quality schema if they want a particular format * non streaming generation prototype * cleaning template * Continued work with type, ingestion into template, and chat template for fire func * Correction - streaming toolcall comes back as delta obj not inside chatcomprespchoice per chat_completion_chunk.py inside OAI lib. * Ruff Formating * Moved stop string and tool updates out of prompt creation func Updated tool pydantic to match OAI Support for streaming Updated generate tool calls to use flag within chat_template and insert tool reminder * Llama 3.1 chat templates Updated fire func template * renamed llama3.1 to chatml_with_headers.. * update name of template * Support for calling a tool start token rather than the string. Simplified tool_params Warning when gen_settings are being overidden becuase user set temp to 0 Corrected schema and tools to correct types for function args. Str for some reason * draft groq tool use model template * changed headers to vars for readablity (but mostly because some models are weird about newlines after headers, so this is an easier way to change globally) * Clean up comments and code in chat comp * Post processed tool call to meet OAI spec rather than forcing model to write json in a string in the middle of the call. * changes example back to args as json rather than string of json * Standardize chat templates to each other * cleaning/rewording * stop elements can also be ints (tokens) * Cleaning/formatting * added special tokens for tools and tool_response as specified in description * Cleaning * removing aux templates - going to live in llm-promp-templates repo instead * Tree: Format Signed-off-by: kingbri <bdashore3@proton.me> * Chat Completions: Don't include internal tool variables in OpenAPI Use SkipJsonSchema to supress inclusion with the OpenAPI JSON. The location of these variables may need to be changed in the future. Signed-off-by: kingbri <bdashore3@proton.me> * Templates: Deserialize metadata on template load Since we're only looking for specific template variables that are static in the template, it makes more sense to render when the template is initialized. Signed-off-by: kingbri <bdashore3@proton.me> * Tools: Fix comments Adhere to the format style of comments in the rest of the project. Signed-off-by: kingbri <bdashore3@proton.me> --------- Co-authored-by: Ben Gitter <gitterbd@gmail.com> Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 00:16:25 -04:00
kingbri	9cc0e70098	Actions: Build kobold docs subpage Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-08 16:40:50 -04:00
kingbri	685e3836e9	Args: Add api-servers to parser Also run OpenAPI export after args/config are parsed. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-08 16:32:29 -04:00
kingbri	63650d2c3c	Model: Disable banned strings if grammar is used ExllamaV2 filters don't allow for rewinding which is what banned strings uses. Therefore, constrained generation via LMFE or outlines is not compatible for now. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-05 11:08:58 -04:00
kingbri	34281c2e14	Start: Add --force-reinstall argument Forces a reinstall of dependencies in the event that one is corrupted or broken. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-04 11:14:38 -04:00
kingbri	ab6c3a53b9	Start: Remove eager upgrade strategy This will upgrade second-level pinned dependencies to their latest versions which is not ideal. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-04 10:50:57 -04:00
kingbri	8ff2586d45	Start: Fix pip update, method calls, and logging platform.system() was not called in some places, breaking the ternary on Windows. Pip's --upgrade flag does not actually update dependencies to their latest versions. That's what the --upgrade-strategy eager flag is for. Tell the user where their start preferences are coming from. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-04 10:30:26 -04:00
kingbri	6a0cfd731b	Main: Only import psutil when the experimental function is run Experimental options shouldn't be imported at the top level until the testing period is over. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 22:00:15 -04:00
kingbri	b6d2676f1c	Start: Give the user a hint when a module can't be imported If an ImportError or ModuleNotFoundError is raised, tell the user to run the update scripts. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 21:59:06 -04:00
kingbri	1aa934664c	Issues: Update issue templates Use forms instead of markdown templates. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 21:59:02 -04:00
kingbri	87b6a31fad	Update .gitignore Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 20:59:28 -04:00
kingbri	4868fc6b10	Update README Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 20:58:26 -04:00
kingbri	5fb9cdc2b1	Dependencies: Add Python 3.12 specific dependencies Install a prebuilt fastparquet wheel for Windows and add setuptools since torch may require it for some reason. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 17:43:14 -04:00
kingbri	2a33ebbf29	Model: Bypass lock checks when shutting down Previously, when a SIGINT was emitted and a model load is running, the API didn't shut down until the load finished due to waitng for the lock. However, when shutting down, the lock doesn't matter since the process is being killed anyway. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 16:05:34 -04:00
Brian Dashore	65c16f2a7c	Merge pull request #161 from theroyallab/new-start-scripts Fix pip index bandwidth costs and update start scripts	2024-08-03 15:21:02 -04:00
kingbri	8703b23f89	Start: Make linux scripts executable Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 15:19:31 -04:00
kingbri	b795bfc7b2	Start: Split some prints up Newlines can be helpful at times. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 15:14:40 -04:00
kingbri	65e758e134	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 15:08:24 -04:00
kingbri	7ce46cc2da	Start: Rewrite start scripts Start scripts now don't update dependencies by default due to mishandling caches from pip. Also add dedicated update scripts and save options to a JSON file instead of a text one. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 13:03:24 -04:00
kingbri	e66d213aef	Revert "Dependencies: Use hosted pip index instead of Github" This reverts commit `f111052e39`. This was a bad idea since the netlify server has limited bandwidth. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 11:35:26 -04:00
kingbri	7bf2b07d4c	Signals: Exit on async cleanup The async signal exit function should be the internal for exiting the program. In addition, prevent the handler from being called twice by adding a boolean. May become an asyncio event later on. In addition, make sure to skip_wait when running model.unload. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-02 15:11:57 -04:00
kingbri	b124797949	Dependencies: Re-add sentence-transformers This is actually required for infinity to load a model. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-02 14:35:58 -04:00
kingbri	56619810bf	Dependencies: Switch sentence-transformers to infinity-emb Leftover before the transition. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-02 13:34:47 -04:00
kingbri	3e42211c3e	Config: Embeddings: Make embeddings_device a default when API loading When loading from the API, the fallback for embeddings_device will be the same as the config. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 13:59:49 -04:00
kingbri	54aeebaec1	API: Fix return of current embeddings model Return a ModelCard instead of a ModelList. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 13:43:31 -04:00
kingbri	0bcb4e4a7d	Model: Attach request ID to logs If multiple logs come in at once, track which log corresponds to which request. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 00:25:54 -04:00
kingbri	9390d362dd	Model: Log generation params and metrics after the prompt/response A user's prompt and response can be large in the console. Therefore, always log the smaller payloads (ex. gen params + metrics) after the large chunks. However, it's recommended to keep prompt logging off anyways since it'll result in console spam. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 00:19:21 -04:00
Brian Dashore	1bf062559d	Merge pull request #158 from AlpinDale/embeddings feat: add embeddings support via Infinity-emb	2024-07-31 20:33:12 -04:00
kingbri	f111052e39	Dependencies: Use hosted pip index instead of Github Installing directly from github causes pip's HTTP cache to not recognize that the correct version of a package is already installed. This causes a redownload. When using the Start.bat script, it updates dependencies automatically to keep users on the latest versions of a package for security reasons. A simple pip cache website helps alleviate this problem and allows pip to find the cached wheels when invoked with an upgrade argument. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 20:46:37 -04:00

1 2 3 4 5 ...

658 Commits