tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-03-15 00:07:28 +00:00

Author	SHA1	Message	Date
kingbri	7e007f0761	Model: Handle finish chunks and logprobs in separate functions Helps split up and trim the generate_gen function. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-24 21:19:03 -04:00
kingbri	3f09fcd8c9	Model: Make model params return a model card The model card is a unified structure for sharing model params. Rather than kwargs, use this instead. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-21 23:15:46 -04:00
kingbri	13beef8021	Model: Move find_template function to templating Makes sense to extract to a utility function instead. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-20 18:27:53 -04:00
kingbri	8e238fa8f6	Model: Move calculate_rope_alpha from backend Makes more sense to use as a utility function. Also clarify how the vars are set. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-20 18:20:19 -04:00
kingbri	b751e0a1d5	Model: Move inline overrides to common This is applied across containers. Doesn't make sense to put this method in the backend. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-20 17:51:57 -04:00
kingbri	034682fcf1	Backends: Add base model container Base class for all model containers. Used in the shared model file for interface. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-20 17:24:10 -04:00
kingbri	f15ac1f69d	Model: Reject model requests when unloading If a model is being unloaded, that means its being shut down and no requests should be accepted from then on. Also, remove model_is_loaded since we simply check if the container is None now. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-19 22:34:06 -04:00
kingbri	3f1d5d396e	Model: Store active jobs in tabby Rather than relying on the generator, use tabby to store the active job IDs. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-16 13:17:55 -04:00
kingbri	1afc9b983e	Model: Remove generate_window Not required since we error with exceeding the max_seq_len Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-16 12:59:02 -04:00
kingbri	2f5235e1a3	Model: Extract settings creation to a separate function Maybe move this out of the class entirely, but for now, it makes sense to encapsulate this logic. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-16 12:57:27 -04:00
kingbri	5697204e47	Merge branch 'main' into model-rewrite	2025-04-16 02:15:46 -04:00
kingbri	6bb5f8f599	Sampling: Rewrite mirostat_mode parameter Apparently the "mirostat" parameter has been updated by frontends to pass a number. ExllamaV2 expects a boolean, but most pass a number anyway, so just alias mirostat_mode and mirostat together. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-16 02:13:55 -04:00
kingbri	3084ef9fa1	Model + API: Migrate to use BaseSamplerParams kwargs is pretty ugly when figuring out which arguments to use. The base requests falls back to defaults anyways, so pass in the params object as is. However, since Python's typing isn't like TypeScript where types can be transformed, the type hinting has a possiblity of None showing up despite there always being a value for some params. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-16 00:50:05 -04:00
kingbri	dcb36e9ab2	Model: Remove extra unwraps The base sampler request already specifies the defaults, so don't unwrap in this way. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-15 23:38:46 -04:00
kingbri	11ed3cf5ee	Model: Cleanup logging and remove extraneous declarations Log the parameters passed into the generate gen function rather than the generation settings to reduce complexity. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-15 23:31:12 -04:00
kingbri	79f9c6e854	Model: Remove num_experts_per_token This shouldn't even be an exposed option since changing it always breaks inference with the model. Let the model's config.json handle it. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-19 11:52:10 -04:00
kingbri	9f649647f0	Model + API: GPU split updates and fixes For the TP loader, GPU split cannot be an empty array. However, defaulting the parameter to an empty array makes it easier to calculate the device list. Therefore, cast an empty array to None using falsy comparisons at load time. Also add draft_gpu_split to the load request. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-15 21:50:14 -05:00
kingbri	beb6d8faa5	Model: Adjust draft_gpu_split and add to config The previous code overrode the existing gpu split and device idx values. This now sets an independent draft_gpu_split value and adjusts the gpu_devices check only if the draft_gpu_split array is larger than the gpu_split array. Draft gpu split is not Tensor Parallel, and defaults to gpu_split_auto if a split is not provided. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-08 16:09:46 -05:00
kingbri	bd8256d168	Merge branch 'main' into draft-split	2025-02-08 15:10:44 -05:00
kingbri	b994aae995	Model: Cleanup generation length and page checks Reduce the amount of if statements and combine parts of code. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-26 23:13:08 -05:00
kingbri	ba2579ff74	Merge branch 'main' into robust-length-checks	2024-12-26 18:00:26 -05:00
kingbri	7878d351a7	Endpoints: Add props endpoint and add more values to model params The props endpoint is a standard used by llamacpp APIs which returns various properties of a model to a server. It's still recommended to use /v1/model to get all the parameters a TabbyAPI model has. Also include the contents of a prompt template when fetching the current model. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-26 17:32:19 -05:00
DocShotgun	4d11323c17	Tree: Format	2024-12-17 09:37:33 -08:00
DocShotgun	5da335eb3d	Model: Robust request length checking in generator * Ensure that length of positive/negative prompt + max_tokens does not exceed max_seq_len * Ensure that total required pages for CFG request does not exceed allocated cache_size	2024-12-17 09:34:43 -08:00
DocShotgun	7f899734c0	Grammar: Cache the engine vocabulary * Avoid rebuilding the KBNF engine vocabulary on every grammar-enabled request	2024-12-05 21:36:37 -08:00
lucy	ab1f4b7a6a	add draft_gpu_split option	2024-11-27 02:52:19 +01:00
DocShotgun	6f2dc2ea99	Grammar: Fix syntax, lint	2024-11-24 11:35:45 -08:00
DocShotgun	8f209efb99	Grammar: Clean up KBNF implementation * Also remove empty cache clear function	2024-11-24 10:44:45 -08:00
DocShotgun	0836a9317f	Grammar: Initial Formatron regex and JSON schema implementation * Replace LMFE's regex and JSON schema filters with Formatron's * Remove Outlines EBNF filter in preparation for Formatron KBNF filter * TODO: Implement Formatron KBNF filter	2024-11-23 10:27:37 -08:00
kingbri	eadc71a4c3	Model: Add unload and error messages for vision If vision is enabled and the model doesn't support it, send an error asking the user to reload. Also, add a method to unload the vision tower. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-22 14:25:03 -05:00
kingbri	0ab393f09c	Model: Set vision load to False by default Mistake in unwrapping. Vision should be false to allow normal model loading when the flag isn't provided. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-21 17:54:42 -05:00
kingbri	902045edbb	API: Fix chat completion formatting flow Previously, the flow for parsing chat completion messages and rendering from the prompt template was disconnected between endpoints. Now, create a common function to render and handle everything appropriately afterwards. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-21 17:51:14 -05:00
kingbri	0fadb1e5e8	Merge branch 'main' into vision	2024-11-19 21:19:21 -05:00
DocShotgun	27d9af50a8	API: Report whether vision is enabled	2024-11-19 12:29:25 -08:00
DocShotgun	5611365c07	OAI: Allow /v1/encode endpoint to handle vision requests * More robust checks for OAI chat completion message lists on /v1/encode endpoint * Added TODO to support other aspects of chat completions * Fix oversight where embeddings was not defined in advance on /v1/chat/completions endpoint	2024-11-19 11:14:37 -08:00
Brian	a69f86098a	Merge pull request #243 from DocShotgun/chunk-size-fix Enforce chunk_size as multiple of 256	2024-11-18 00:40:36 -05:00
DocShotgun	dd41eec8a4	OAI: Initial vision support in OAI chat completions * Support image_url inputs containing URLs or base64 strings following OAI vision spec * Use async lru cache for image embeddings * Add generic wrapper class for multimodal embeddings	2024-11-17 21:23:09 -08:00
DocShotgun	5bb46df3c3	Model: Fix draft model non-FA2 fallback	2024-11-15 21:04:25 -08:00
DocShotgun	37cc701137	Model: Enforce chunk_size as multiple of 256	2024-11-15 20:35:18 -08:00
kingbri	69ac0eb8aa	Model: Add vision loading support Adds the ability to load vision parts of text + image models. Requires an explicit flag in config because there isn't a way to automatically determine whether the vision tower should be used. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-11 12:10:11 -05:00
kingbri	cc2516790d	Model: Add support for chat_template.json HuggingFace separated the chat template in the newest transformers versions. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-11 12:10:06 -05:00
kingbri	9530f8c8c7	Model: Add support for chat_template.json HuggingFace separated the chat template in the newest transformers versions. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-11 12:09:27 -05:00
DocShotgun	603760cecb	Model: Remove override_base_seq_len	2024-10-30 10:03:08 +08:00
TerminalMan	7d18d2e2ca	Refactor the sampling class (#199 ) * improve validation * remove to_gen_params functions * update changes for all endpoint types * OAI: Fix calls to generation Chat completion and completion need to have prompt split out before pushing to the backend. Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Convert Top-K values of -1 to 0 Some OAI implementations use -1 as disabled instead of 0. Therefore, add a coalesce case. Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Format and space out Make the code more readable. Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Fix mirostat Field items are nested in data within a Pydantic FieldInfo Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Format Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Fix banned_tokens and allowed_tokens conversion If the provided string has whitespace, trim it before splitting. Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Add helpful log to dry_sequence_breakers Let the user know if the sequence errors out. Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Apply validators in right order Validators need to be applied in order from top to bottom, this is why the after validator was not being applied properly. Set the model to validate default params for sampler override purposes. This can be turned off if there are unclear errors. Signed-off-by: kingbri <bdashore3@proton.me> * Endpoints: Format Cleanup and semantically fix field validators Signed-off-by: kingbri <bdashore3@proton.me> * Kobold: Update validators and fix parameter application Validators on parent fields cannot see child fields. Therefore, validate using the child fields instead and alter the parent field data from there. Also fix badwordsids casting. Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Remove validate defaults and fix mirostat If a user sets an override to a non-default value, that's their own fault. Run validator on the actual mirostat_mode parameter rather than the alternate mirostat parameter. Signed-off-by: kingbri <bdashore3@proton.me> * Kobold: Rework badwordsids Currently, this serves to ban the EOS token. All other functionality was legacy, so remove it. Signed-off-by: kingbri <bdashore3@proton.me> * Model: Remove HuggingfaceConfig This was only necessary for badwordsids. All other fields are handled by exl2. Keep the class as a stub if it's needed again. Signed-off-by: kingbri <bdashore3@proton.me> * Kobold: Bump kcpp impersonation TabbyAPI supports XTC now. Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Change alias to validation_alias Reduces the probability for errors and makes the class consistent. Signed-off-by: kingbri <bdashore3@proton.me> * OAI: Use constraints for validation Instead of adding a model_validator, use greater than or equal to constraints provided by Pydantic. Signed-off-by: kingbri <bdashore3@proton.me> * Tree: Lint Signed-off-by: kingbri <bdashore3@proton.me> --------- Co-authored-by: SecretiveShell <84923604+SecretiveShell@users.noreply.github.com> Co-authored-by: kingbri <bdashore3@proton.me>	2024-10-27 11:43:41 -04:00
Brian Dashore	6e48bb420a	Model: Fix inline loading and draft key (#225 ) * Model: Fix inline loading and draft key There was a lack of foresight between the new config.yml and how it was structured. The "draft" key became "draft_model" without updating both the API request and inline loading keys. For the API requests, still support "draft" as legacy, but the "draft_model" key is preferred. Signed-off-by: kingbri <bdashore3@proton.me> * OAI: Add draft model dir to inline load Was not pushed before and caused errors of the kwargs being None. Signed-off-by: kingbri <bdashore3@proton.me> * Model: Fix draft args application Draft model args weren't applying since there was a reset due to how the old override behavior worked. Signed-off-by: kingbri <bdashore3@proton.me> * OAI: Change embedding model load params Use embedding_model_name to be inline with the config. Signed-off-by: kingbri <bdashore3@proton.me> * API: Fix parameter for draft model load Alias name to draft_model_name. Signed-off-by: kingbri <bdashore3@proton.me> * API: Fix parameter for template switch Add prompt_template_name to be more descriptive. Signed-off-by: kingbri <bdashore3@proton.me> * API: Fix parameter for model load Alias name to model_name for config parity. Signed-off-by: kingbri <bdashore3@proton.me> * API: Add alias documentation Signed-off-by: kingbri <bdashore3@proton.me> --------- Signed-off-by: kingbri <bdashore3@proton.me>	2024-10-24 23:35:05 -04:00
kingbri	126a44483c	Tree: Remove fasttensors Now a noop in upstream. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-30 00:18:47 -04:00
kingbri	56ce82ef77	Sampling: Add XTC support Matches with upstream. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-24 18:10:52 -04:00
TerminalMan	2cda890deb	Add health check monitoring for EXL2 errors (#206 ) * Add health check monitoring for EXL2 errors * Health: Format and change status code A status code of 503 makes more sense to use. ---------	2024-09-22 21:40:36 -04:00
kingbri	75af974c88	Model: Raise an error if the context length is too large The dynamic generator gave a not-so-helpful exception already which basically said to not exceed the max sequence length. Instead of possible undefined behavior, error out. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-19 22:05:56 -04:00
kingbri	24ea85b3c5	Tree: Use safe loader for YAML Loaders that read use a safe type while loaders that write use both round-trip and safe options. Also don't create module-level parsers where they're not needed. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-18 19:26:51 -04:00

1 2 3 4

193 Commits