tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-04-20 14:28:54 +00:00

Author	SHA1	Message	Date
kingbri	c0ad647fa7	Model: Auto-detect a one GPU setup and fix gpu_split_auto It makes more sense to use gpu split parameters when the user has >1 GPUs. Otherwise, set split and split_auto to False and save the user some VRAM. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-06 23:08:57 -05:00
kingbri	849179df17	Model: Make loading use less VRAM The model loader was using more VRAM on a single GPU compared to base exllamav2's loader. This was because single GPUs were running using the autosplit config which allocates an extra vram buffer for safe loading. Turn this off for single-GPU setups (and turn it off by default). This change should allow users to run models which require the entire card with hopefully faster T/s. For example, Mixtral with 3.75bpw increased from ~30T/s to 50T/s due to the extra vram headroom on Windows. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-06 22:29:56 -05:00
kingbri	fedebadc81	Model: Fix generate window fallback Use max_seq_len as the numerator, not the max_tokens. Mismatched parameter. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-06 14:48:42 -05:00
kingbri	f1ea15d77e	Model: Remove backwards compatability hacks Now that exllamav2 is required to be the latest, don't add attribute checks unless the feature is not in the release build. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-02 23:53:45 -05:00
kingbri	6eeb62b82c	Requirements: Update exllamav2, torch, and FA2 Torch to 2.2, exllamav2 to 0.0.13, FA2 to 2.4.2 on Windows and 2.5.2 on Linux. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-02 23:53:42 -05:00
kingbri	1919bf7705	Launch: Make exllamav2 requirement more friendly Add the ability to use an unsafe config flag if needed and migrate the exl2 check to a different file within the exl2 backend code. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-02 23:36:17 -05:00
kingbri	b827bcbb44	Sampling: Cleanup and update Cleanup how overrides are handled, class naming, and adopt exllamav2's model class to enforce latest stable version methods rather than adding multiple backwards compatability checks. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-02 23:36:17 -05:00
kingbri	d3781920b3	OAI: Split up utility functions Just like types, put utility functions in their own separate module based on the route. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-02 23:36:17 -05:00
Alexander Abushady	d7c18855e7	added quadratic sampling (#56 ) * added quadratic sampling * Update sample_preset.yml * oops missed a spot * Sampling: Fix smoothing factor semantics	2024-02-02 22:12:59 -05:00
kingbri	4a7b8b1b7a	Samplers: Add dynamic temperature Does not work if max_temp is less than or equal to min_temp. Sampler validation will have to be refactored in the future, so the dynamic temperature check will also be changed. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-31 01:20:59 -05:00
kingbri	751627e571	OAI: Add fasttensors to model load endpoint Also fix logging when loading prompt templates. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-25 01:08:02 -05:00
kingbri	fc4570220c	API + Model: Add new parameters and clean up documentation The example JSON fields were changed because of the new sampler default strategy. Fix these by manually changing the values. Also add support for fasttensors and expose generate_window to the API. It's recommended to not adjust generate_window as it's dynamically scaled based on max_seq_len by default. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-25 00:15:40 -05:00
kingbri	90fb41a77a	Model: Fix prompt template initialization The previous commit iterated through multiple try conditions which made it so the user has to provide a dummy prompt template. Now, template loading is fallback based. Run through a loop of functions and return if one of them succeeds. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-25 00:15:40 -05:00
kingbri	740b0215dd	Model: Dynamically scale generate_window Allows for adjustment of reservation space at the end of the context before rolling it. This should be scaled as a model's max_seq_len goes up. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-25 00:15:40 -05:00
kingbri	de0ba7214c	API: Add template switching and unload endpoints Templates can be switched and unloaded without reloading the entire model. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-25 00:15:40 -05:00
kingbri	78f920eeda	Tree: Refactor code organization Move common functions into their own folder and refactor the backends to use their own folder as well. Also cleanup imports and alphabetize import statments themselves. Finally, move colab and docker into their own folders as well. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-25 00:15:40 -05:00

... 2 3 4 5 6

266 Commits