tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-04-20 06:19:15 +00:00

Author	SHA1	Message	Date
turboderp	52e093ae6c	Model: Enable max_rq_tokens (output chunking)	2025-10-05 18:54:45 +02:00
turboderp	e09a61969f	Model: Fix NCCL detection	2025-10-05 18:52:37 +02:00
kingbri	a4d02c2b70	Model: Add log messages for model loading It's useful to know the split method that the model is being loaded on. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-17 23:09:27 -04:00
kingbri	a3a32c30a4	Model: Add utils file Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-17 22:43:19 -04:00
kingbri	43f9483bc4	Model: Add tensor_parallel_backend option This allows for users to use nccl or native depending on the GPU setup. NCCL is only available with Linux built wheels. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-17 22:35:10 -04:00
Forkoz	60ae419746	Model.py TP changes	2025-08-12 21:01:54 +00:00
kingbri	fe149489af	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-05 01:22:18 -04:00
AUTOMATIC	056527ceb3	add logprobs support for exl3	2025-08-03 11:42:32 +03:00
kingbri	0b4ca567f8	API: Persist request IDs and append full_text to finish chunk Adding these to each generation chunk helps remove redundancy and unecessary request ID operations. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-25 12:27:44 -04:00
turboderp	0ae878712e	Exl3: Clear image embedding cache on unload	2025-06-25 23:56:21 +02:00
kingbri	a02d39de31	Model: Remove rogue print Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-17 23:09:07 -04:00
kingbri	2913ce29fc	API: Add timings to usage stats It's useful for the client to know what the T/s and total time for generation are per-request. Works with both completions and chat completions. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-17 22:54:51 -04:00
kingbri	5d94d4d022	Merge branch 'main' into breaking	2025-06-17 22:24:32 -04:00
turboderp	21c5af48e1	Tree: Format	2025-06-15 19:30:38 +02:00
turboderp	1c9891bf04	Exl3: Add vision capability	2025-06-15 19:22:51 +02:00
turboderp	4605c0f6bd	Common: Refactor get_image to common functions	2025-06-15 19:20:36 +02:00
turboderp	d357f100d0	Dependencies: Bump ExllamaV3	2025-06-15 19:12:45 +02:00
turboderp	a0c16bba2a	Exl2: Fix banned_strings (move outside of assign_gen_params)	2025-06-15 16:51:42 +02:00
kingbri	2096c9bad2	Model: Default max_seq_len to 4096 A common problem in TabbyAPI is that users who want to get up and running with a model always had issues with max_seq_len causing OOMs. This is because model devs set max context values in the millions which requires a lot of VRAM. To idiot-proof first time setup, make the fallback default 4096 so users can run their models. If a user still wants to use the model's max_seq_len, set it to -1. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-13 14:57:24 -04:00
turboderp	691a080ac7	Dependencies: Bump ExllamaV3 and ExllamaV2	2025-05-31 23:55:04 +02:00
kingbri	0c4cc1eba3	Model: Add prompt logging to ExllamaV3 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 22:05:18 -04:00
gakada	ba6248eec0	Exl3: fix add_bos in generator	2025-05-17 19:10:49 +09:00
kingbri	17f3dca6fc	Packaging: Add agnostic method to check version of packages Some packages such as ExllamaV2 and V3 require specific versions for the latest features. Rather than creating repetitive functions, create an agnostic function to check the installed package and then report to the user to upgrade. This is also sent to requests for loading and unloading, so keep the error short. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 01:04:24 -04:00
kingbri	084916c04f	Model: Fix autosplit reserve crash with GPU split ExllamaV3 does not accept autosplit_reserve and gpu_split at the same time. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 00:51:14 -04:00
kingbri	0858b6d4b2	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 00:46:40 -04:00
kingbri	390daeb92f	Model: Create universal HFModel class The HFModel class serves to coalesce all config files that contain random keys which are required for model usage. Adding this base class allows us to expand as HuggingFace randomly changes their JSON schemas over time, reducing the brunt that backend devs need to feel when their next model isn't supported. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-13 18:12:38 -04:00
kingbri	bd3fec929c	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 11:32:27 -04:00
kingbri	a524ac3c0f	Model: Fix cache mode again If statements can be difficult to work with. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 11:30:47 -04:00
kingbri	20cad851e9	Model: Fix param call Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 09:52:28 -04:00
kingbri	d15eb55f20	Model: Fix exl2 cache mode check FP16 was not included in the validation step. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 09:51:09 -04:00
kingbri	656af41b5d	Model: Always enable decode_special_tokens The frontend should handle the special tokens if they get emitted. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 22:25:50 -04:00
kingbri	42346c6b39	Sampling: Remove skip_special_tokens This parameter is way too confusing and does not make sense in the modern LLM space. Change approved by all maintainers. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 22:11:33 -04:00
kingbri	25c77ebf77	Model: Remove exllamav2-specific version check No longer necessary thanks to the agnostic check. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 22:08:15 -04:00
kingbri	638eef401a	Model: Move cache creation to a common function Prevents repetitiveness while also creating a Cache class. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-08 23:10:03 -04:00
DocShotgun	9dcde59c57	Model: Check for unsupported cache mode in exllamav2	2025-05-06 01:18:15 -07:00
DocShotgun	45b966363e	Tree: Format	2025-05-03 21:01:03 -07:00
DocShotgun	a635a719d7	Model: Enable draft model q-cache in Exl3 * Remove unneeded default fp16 cache layer import	2025-05-03 20:59:36 -07:00
DocShotgun	58e34ba4c5	Model: Exl3 cache quant settings lenient with whitespace	2025-05-03 20:35:35 -07:00
DocShotgun	68a660bdb3	Model: Initial Exl3 cache quantization support	2025-05-03 20:35:35 -07:00
turboderp	92ea7ee7cd	Model: Add draft model/speculative decoding	2025-05-04 01:27:42 +02:00
turboderp	1db2cb99cb	Model: Avoid initializing class variables	2025-05-04 01:26:42 +02:00
turboderp	0405a94a89	Model: Cast penalty range to int	2025-05-03 22:28:36 +02:00
turboderp	58c380b8ca	Model: Create generator on load	2025-05-03 18:33:37 +02:00
turboderp	0d949d00b9	Model: Set default max_batch_size	2025-05-03 18:33:37 +02:00
turboderp	8c75b29923	Model: Fix some warnings	2025-05-03 18:33:36 +02:00
kingbri	15cc480cb0	Exl3: Simplify add_bos_token handling Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:50:42 -04:00
randoentity	d8a8ccfc2a	Model: fix add_bos_token	2025-05-02 21:33:25 -04:00
kingbri	0d02af3c81	Model: Set model_dir on init Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	c89bea030e	Model: Add template fetching to Exl3 Use the same functionality as exl2's loader. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	e8f00412f6	Model: Fetch from generation_config and tokenizer_config in Exl3 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00

1 2 3 4 5 ...

312 Commits