tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-03-14 15:57:27 +00:00

Author	SHA1	Message	Date
turboderp	0985c7f7b7	Sampling: Add adaptive-P params	2026-01-20 19:09:54 +01:00
turboderp	8a824cb127	Dependencies: Update exllamav3	2026-01-20 18:52:44 +01:00
kingbri	84bb1ce9fd	Dependencies: Fix FA2 wheels Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-12-19 16:52:05 -05:00
kingbri	5627f4d69e	Dependencies: Update to torch 2.9 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-12-19 15:59:40 -05:00
turboderp	f04fc6eb25	Dependencies: Update exllamav3	2025-12-16 12:58:31 +01:00
Brian	55288e5a1f	Merge pull request #402 from AlpinDale/auto-select-gpu [startup] auto-select GPU backend	2025-12-08 22:04:26 -05:00
AlpinDale	76ffc7c458	[startup] auto-select GPU backend	2025-12-08 23:52:02 +00:00
turboderp	8b6b793bfc	Dependencies: Update exllamav3	2025-11-25 21:17:31 +01:00
Brian	685aca5a7d	Merge pull request #397 from beep39/json-schema-for-exllamav3 Constrained generation with json schema for ExllamaV3	2025-11-24 22:34:31 -05:00
kingbri	126759034e	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-11-24 22:32:19 -05:00
turboderp	f50015af5e	Dependencies: Update exllamav3	2025-11-23 23:27:26 +01:00
Brian	df724fdc78	Merge pull request #393 from mefich/main Unloading vision model of VLMs for Exllamav3 backend	2025-11-19 22:46:59 -05:00
beep39	d53ca1345a	Constrained generation with json schema for ExllamaV3	2025-11-18 02:01:31 +09:00
turboderp	fece4791ad	exllamav2: Make sure cache size is set in unpaged mode	2025-11-06 21:03:24 +01:00
turboderp	368e87eb7d	Fix exllamav3 URL	2025-11-03 12:35:13 +01:00
turboderp	c6bf59063d	Dependencies: Update exllamav3	2025-11-02 23:45:34 +01:00
mefich	37aea9de83	Update exl3 backend model.py: fix for unloading vision models This change ensures that when unloading vlm their vision part is also unloaded.	2025-10-30 12:30:23 +05:00
turboderp	996bc8dbe1	Dependencies: Update exllamav3	2025-10-17 23:41:44 +02:00
turboderp	2539acf800	Dependencies: Update exllamav3	2025-10-15 16:01:57 +02:00
turboderp	486dd0418e	Formatting	2025-10-15 10:47:58 +02:00
turboderp	0af29d957a	Fix #390	2025-10-15 10:40:19 +02:00
kingbri	ad64942fa1	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-10-14 23:49:13 -04:00
kingbri	f205349c81	Config: Fix use_as_default application Apply the default overrides after inline config has been merged. Do not require an inline config to apply use_as_default and other overrides. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-10-14 23:45:39 -04:00
kingbri	6f73a0b388	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-10-14 23:06:20 -04:00
kingbri	5cb8f3ed2c	Config: Fix comments for max_seq_len and cache_size The default is the minimum between max_position_embeddings and cache_size. On AMD and older than Ampere NVIDIA GPUs, cache_size is ignored due to not being supported by batching on exl2. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-10-14 23:04:36 -04:00
kingbri	fdb86f4c63	ExllamaV2: Add max_seq_len empty case like ExllamaV3 Also remove the intermediate base_seq_len and target_seq_len variables to make code clearer. If paged mode is off, max_seq_len becomes the prime mover since batching is unavailable. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-10-14 23:02:52 -04:00
kingbri	69a25d7fa6	Config + Endpoints: Make cache_size more prominent Since cache_size is a more important parameter now for multi-user setups, mark it as such by placing it below max_seq_len. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-10-14 21:53:33 -04:00
kingbri	62e9fa217a	ExllamaV3: Handle max_seq_len defined and cache_size undefined case The previous changes broke existing configs and max_seq_len was force-overriden to 4096. This helps single-user setups since they do not really benefit from the split cache_size max_seq_len mechanism (except if batching). cache_size is still the prime mover in exl3 due to its paging mechanism. Ideally, for multi-user setups, cache_size should take as much VRAM as possible and max_seq_len should be limited. Breakdown: cache_size and max_seq_len specified -> values only cache_size/max_seq_len specified -> max_seq_len = cache_size and vice versa neither specified -> cache_size = 4096, max_seq_len = min(max_position_embeddings, cache_size) Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-10-14 21:48:36 -04:00
turboderp	04ca346732	Fix formatting	2025-10-14 03:11:59 +02:00
turboderp	ec50ad17ea	Merge branch 'main_seq'	2025-10-14 02:58:00 +02:00
turboderp	8abdfe7b13	Config: replace disable_output_chunking flag with output_chunking	2025-10-14 02:47:52 +02:00
turboderp	7eee3924c7	Merge remote-tracking branch 'origin/main_seq' into main_seq	2025-10-14 00:58:42 +02:00
turboderp	f73e88e9e9	Dependencies: update exllamav3	2025-10-14 00:58:14 +02:00
kingbri	85459ce600	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-10-09 22:33:53 -04:00
turboderp	01a5915a7b	Dependencies: Pin Pydantic to version 2.11.0 For now. There appear to be breaking changes in 2.12.0 that affect both Formatron and FastAPI.	2025-10-08 20:43:26 +02:00
turboderp	4235f98e83	Model: Change cache_size/max_seq_len behavior - Cache size is now given only by the cache_size config option. Default is 4096 (user should always override to max out VRAM) - max_seq_len, if not overridden in the config, will default to the model's config.json - max_seq_len is reduced to be no larger than the cache	2025-10-05 22:16:01 +02:00
turboderp	d672dc2137	API: Fix race condition when client disconnects	2025-10-05 21:23:02 +02:00
turboderp	52e093ae6c	Model: Enable max_rq_tokens (output chunking)	2025-10-05 18:54:45 +02:00
turboderp	e09a61969f	Model: Fix NCCL detection	2025-10-05 18:52:37 +02:00
kingbri	7a0dddcbd9	Dependencies: Update exllamav3 v0.0.7 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-09-30 17:34:02 -04:00
turboderp	1d3a308709	Fix wiki link in README.md	2025-08-26 13:03:18 +02:00
kingbri	d7eb580e99	Start: Fix uv check In Windows, checking for a command yields a FileNotFound error if the utility isn't found. This led to complicated logic which can be solved by using which instead. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-21 18:23:42 -04:00
kingbri	4036c70d75	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-19 22:59:26 -04:00
kingbri	bd3aa5bb04	Docs: Add uv section UV is now supported as first-party in tabbyAPI's start script, so add a dedicated section to it and recommend over miniconda. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-19 22:57:03 -04:00
kingbri	1f4186512e	Start: Add check for uv Uv is the definitive package installation tool for Python, so add support to check for it via the start script. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-19 22:57:03 -04:00
kingbri	30a3cd75cf	Start: Migrate options from cu121/118 to cu12 This encapsulates more cuda versions and makes install easier for new users. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-19 22:56:58 -04:00
kingbri	1344726936	Docs: Sampler overrides part 2 Actually commit the edits. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-19 21:19:12 -04:00
Brian	86f27c9c93	Merge pull request #377 from DocShotgun/main Config: Enable safe sampler overrides by default	2025-08-18 23:12:34 -04:00
kingbri	e07df3951e	Docs: Update sampler overrides Change the sampling subsection to sampler overrides and add a warning about the default preset. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-18 23:06:16 -04:00
kingbri	067d63773e	Config: Move sampling higher in the list This has become a bigger priority with addition of the safe_defaults noob proofing. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-18 22:55:03 -04:00

1 2 3 4 5 ...

1113 Commits