tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-03-14 15:57:27 +00:00

Author	SHA1	Message	Date
kingbri	94696543bc	Model: Warn user if context > max_seq_len Unlike other backends, tabby attempts to generate even if the context is greater than the max sequence length via truncation of the given context. Rather than artifically erroring out, give a warning that outputted console metrics are going to be incorrect and to make sure that context <= max_seq_len. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-29 01:35:32 -05:00
kingbri	cad144126f	API: Rename repetition_decay -> repetition_slope Also fix the fallback to use 0 for sanity checking and validation. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-29 01:13:05 -05:00
kingbri	d47c39da54	API: Don't include draft directory in response The draft directory should be returned for a draft model request (TBD). Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-23 00:07:56 -05:00
kingbri	71b9a53336	API: Add temperature_last support Documented in previous commits. Also make sure that for version checking, check the value of kwargs instead of if the key is present since requests pass default values. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-21 21:20:59 -05:00
turboderp	3337fe6acc	Warning if unsupported samplers are used	2023-11-21 18:35:22 +01:00
turboderp	a54de11cf3	Add new samplers	2023-11-21 18:16:53 +01:00
Veden	f960fac8ff	Fix incorrect ratio calculation for draft model	2023-11-19 13:12:53 -08:00
kingbri	4cddd0400c	Model: Fix draft model loading Use draft_config to find the path instead of kwargs. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-19 02:04:02 -05:00
kingbri	31bc418795	Model: Add context in response output When printing to the console, give information about the context (ingestion token count). Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-19 00:49:32 -05:00
kingbri	6b9af58cc1	Tree: Fix extraneous bugs and update T/s print Model: Add extra information to print and fix the divide by zero error. Auth: Fix validation of API and admin keys to look for the entire key. References #7 and #6 Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-18 22:34:40 -05:00
Brian Dashore	b2410a0436	Merge pull request #4 from waldfee/config_samples Adds draft model support to config.yml	2023-11-18 13:16:23 -05:00
kingbri	27ebec3b35	Model: Add speculative decoding support via config Speculative decoding makes use of draft models that ingest the prompt before forwarding it to the main model. Add options in the config to support this. API options will occur in a different commit. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-18 01:42:20 -05:00
kingbri	2ad79cb9ea	Model: Add tokens in responses Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-17 23:33:48 -05:00
kingbri	9dfa580b1e	Model: Add tokens/second output Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-17 01:16:20 -05:00
kingbri	d5551352bf	Model: Fix parsing of stop conditions Add the EOS token into stop strings after checking kwargs. If ban_eos_token is on, don't add the EOS token in for extra measure. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 17:15:33 -05:00
kingbri	126afdfdc2	Model: Fix gpu split params GPU split auto is a bool and GPU split is an array of integers for GBs to allocate per GPU. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-15 00:55:15 -05:00
kingbri	ea91d17a11	Api: Add ban_eos_token and add_bos_token support Adds the ability for the client to specify whether to add the BOS token and ban the EOS token. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-15 00:55:15 -05:00
kingbri	8fea5391a8	Api: Add token endpoints Support for encoding and decoding with various parameters. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-15 00:55:15 -05:00
kingbri	b625bface9	OAI: Add API-based model loading/unloading and auth routes Models can be loaded and unloaded via the API. Also add authentication to use the API and for administrator tasks. Both types of authorization use different keys. Also fix the unload function to properly free all used vram. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-14 01:17:19 -05:00
kingbri	47343e2f1a	OAI: Add models support The models endpoint fetches all the models that OAI has to offer. However, since this is an OAI clone, just list the models inside the user's configured model directory instead. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-13 21:38:34 -05:00
kingbri	eee8b642bd	OAI: Implement completion API endpoint Add support for /v1/completions with the option to use streaming if needed. Also rewrite API endpoints to use async when possible since that improves request performance. Model container parameter names also needed rewrites as well and set fallback cases to their disabled values. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-13 18:31:26 -05:00
turboderp	4fa4386275	Add new samplers	2023-11-12 08:12:08 +01:00
kingbri	a10c14d357	Config: Switch to YAML and add load progress YAML is a more flexible format when it comes to configuration. Commandline arguments are difficult to remember and configure especially for an API with complicated commandline names. Rather than using half-baked textfiles, implement a proper config solution. Also add a progress bar when loading models in the commandline. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-12 00:21:16 -05:00
kingbri	5d32aa02cd	Tree: Update to use ModelContainer and args Use command-line arguments to load an initial model if necessary. API routes are broken, but we should be using the container from now on as a primary interface with the exllama2 library. Also these args should be turned into a YAML configuration file in the future. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-10 23:19:54 -05:00
turboderp	9d34479e3e	Model container with generator logic, initial	2023-11-11 02:53:00 +01:00

25 Commits