tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-03-14 15:57:27 +00:00

Author	SHA1	Message	Date
kingbri	ad64942fa1	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-10-14 23:49:13 -04:00
kingbri	f205349c81	Config: Fix use_as_default application Apply the default overrides after inline config has been merged. Do not require an inline config to apply use_as_default and other overrides. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-10-14 23:45:39 -04:00
kingbri	6f73a0b388	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-10-14 23:06:20 -04:00
kingbri	5cb8f3ed2c	Config: Fix comments for max_seq_len and cache_size The default is the minimum between max_position_embeddings and cache_size. On AMD and older than Ampere NVIDIA GPUs, cache_size is ignored due to not being supported by batching on exl2. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-10-14 23:04:36 -04:00
kingbri	fdb86f4c63	ExllamaV2: Add max_seq_len empty case like ExllamaV3 Also remove the intermediate base_seq_len and target_seq_len variables to make code clearer. If paged mode is off, max_seq_len becomes the prime mover since batching is unavailable. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-10-14 23:02:52 -04:00
kingbri	69a25d7fa6	Config + Endpoints: Make cache_size more prominent Since cache_size is a more important parameter now for multi-user setups, mark it as such by placing it below max_seq_len. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-10-14 21:53:33 -04:00
kingbri	62e9fa217a	ExllamaV3: Handle max_seq_len defined and cache_size undefined case The previous changes broke existing configs and max_seq_len was force-overriden to 4096. This helps single-user setups since they do not really benefit from the split cache_size max_seq_len mechanism (except if batching). cache_size is still the prime mover in exl3 due to its paging mechanism. Ideally, for multi-user setups, cache_size should take as much VRAM as possible and max_seq_len should be limited. Breakdown: cache_size and max_seq_len specified -> values only cache_size/max_seq_len specified -> max_seq_len = cache_size and vice versa neither specified -> cache_size = 4096, max_seq_len = min(max_position_embeddings, cache_size) Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-10-14 21:48:36 -04:00
turboderp	04ca346732	Fix formatting	2025-10-14 03:11:59 +02:00
turboderp	ec50ad17ea	Merge branch 'main_seq'	2025-10-14 02:58:00 +02:00
turboderp	8abdfe7b13	Config: replace disable_output_chunking flag with output_chunking	2025-10-14 02:47:52 +02:00
turboderp	7eee3924c7	Merge remote-tracking branch 'origin/main_seq' into main_seq	2025-10-14 00:58:42 +02:00
turboderp	f73e88e9e9	Dependencies: update exllamav3	2025-10-14 00:58:14 +02:00
kingbri	85459ce600	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-10-09 22:33:53 -04:00
turboderp	01a5915a7b	Dependencies: Pin Pydantic to version 2.11.0 For now. There appear to be breaking changes in 2.12.0 that affect both Formatron and FastAPI.	2025-10-08 20:43:26 +02:00
turboderp	4235f98e83	Model: Change cache_size/max_seq_len behavior - Cache size is now given only by the cache_size config option. Default is 4096 (user should always override to max out VRAM) - max_seq_len, if not overridden in the config, will default to the model's config.json - max_seq_len is reduced to be no larger than the cache	2025-10-05 22:16:01 +02:00
turboderp	d672dc2137	API: Fix race condition when client disconnects	2025-10-05 21:23:02 +02:00
turboderp	52e093ae6c	Model: Enable max_rq_tokens (output chunking)	2025-10-05 18:54:45 +02:00
turboderp	e09a61969f	Model: Fix NCCL detection	2025-10-05 18:52:37 +02:00
kingbri	7a0dddcbd9	Dependencies: Update exllamav3 v0.0.7 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-09-30 17:34:02 -04:00
turboderp	1d3a308709	Fix wiki link in README.md	2025-08-26 13:03:18 +02:00
kingbri	d7eb580e99	Start: Fix uv check In Windows, checking for a command yields a FileNotFound error if the utility isn't found. This led to complicated logic which can be solved by using which instead. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-21 18:23:42 -04:00
kingbri	4036c70d75	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-19 22:59:26 -04:00
kingbri	bd3aa5bb04	Docs: Add uv section UV is now supported as first-party in tabbyAPI's start script, so add a dedicated section to it and recommend over miniconda. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-19 22:57:03 -04:00
kingbri	1f4186512e	Start: Add check for uv Uv is the definitive package installation tool for Python, so add support to check for it via the start script. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-19 22:57:03 -04:00
kingbri	30a3cd75cf	Start: Migrate options from cu121/118 to cu12 This encapsulates more cuda versions and makes install easier for new users. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-19 22:56:58 -04:00
kingbri	1344726936	Docs: Sampler overrides part 2 Actually commit the edits. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-19 21:19:12 -04:00
Brian	86f27c9c93	Merge pull request #377 from DocShotgun/main Config: Enable safe sampler overrides by default	2025-08-18 23:12:34 -04:00
kingbri	e07df3951e	Docs: Update sampler overrides Change the sampling subsection to sampler overrides and add a warning about the default preset. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-18 23:06:16 -04:00
kingbri	067d63773e	Config: Move sampling higher in the list This has become a bigger priority with addition of the safe_defaults noob proofing. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-18 22:55:03 -04:00
DocShotgun	6fb0c2cdbd	Config: Update description for override_preset default * We provide safe_defaults as a default in config_sample.yml but not internally	2025-08-18 12:39:52 -07:00
DocShotgun	998abe5ad1	Config: Enable safe sampler overrides by default * Provides safe fallback samplers, intended for better out-of-the-box support for clients that do not pass sampler params	2025-08-18 12:32:28 -07:00
kingbri	a4d02c2b70	Model: Add log messages for model loading It's useful to know the split method that the model is being loaded on. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-17 23:09:27 -04:00
kingbri	a3a32c30a4	Model: Add utils file Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-17 22:43:19 -04:00
Brian	05791a25a1	Merge pull request #375 from Ph0rk0z/patch-1 experimental: native exllamav3 TP, no fuss	2025-08-17 22:37:25 -04:00
kingbri	43f9483bc4	Model: Add tensor_parallel_backend option This allows for users to use nccl or native depending on the GPU setup. NCCL is only available with Linux built wheels. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-17 22:35:10 -04:00
kingbri	b9952f319e	Merge branch 'main' into exl3-tp	2025-08-17 21:21:40 -04:00
kingbri	f2a39e3a61	Dependencies: Update exllama, torch, and flash attention Torch: 2.8 ExllamaV2: v0.3.2 torch 2.8 ExllamaV3: v0.0.6 torch 2.8 FA: v2.8.3 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-17 21:19:23 -04:00
Forkoz	60ae419746	Model.py TP changes	2025-08-12 21:01:54 +00:00
Brian	6623dbcd86	Merge pull request #373 from AUTOMATIC1111/exl3-logprobs add logprobs support for exl3	2025-08-05 01:24:06 -04:00
kingbri	fe149489af	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-05 01:22:18 -04:00
Brian	83f778db2d	Merge pull request #374 from DocShotgun/main Templating: Support chat_template.jinja	2025-08-05 01:18:25 -04:00
DocShotgun	81a115b781	Templating: Support chat_template.jinja	2025-08-03 16:10:08 -07:00
AUTOMATIC	056527ceb3	add logprobs support for exl3	2025-08-03 11:42:32 +03:00
Brian	03d72a37be	Merge pull request #371 from DocShotgun/main Config: Remove developer arg cuda_malloc_backend	2025-08-01 14:02:57 -04:00
DocShotgun	102af306e5	Config: Remove developer arg cuda_malloc_backend * cudaMallocAsync is now enabled by default on supported configurations	2025-08-01 10:59:13 -07:00
kingbri	113643c0df	Main: Enable cudaMallocAsync backend by default Works on cuda 12.4 and up. If CUDA doesn't exist, then don't enable the backend. This is an env var that needs to be set, so it's not really possible to set it via config.yml. This used to be experimental, but it's probably fine to keep it enabled since it only provides a benefit. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-27 22:31:38 -04:00
kingbri	0b4ca567f8	API: Persist request IDs and append full_text to finish chunk Adding these to each generation chunk helps remove redundancy and unecessary request ID operations. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-25 12:27:44 -04:00
kingbri	e77fa0b7a8	Docs: Edit inline loading for breaking changes Add the model key for the YAML examples. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-24 18:11:42 -04:00
kingbri	ab04a6ed60	Dependencies: Bump ExllamaV3 v0.0.5 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-18 22:56:35 -04:00
kingbri	bf936f5c39	Dependencies: Update exllamav2 v0.3.2 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-13 23:33:12 -04:00

1 2 3 4 5 ...

1092 Commits