turboderp
da3d3338e8
Logging: Fix env var parsing, formatting
2026-03-27 02:31:36 +01:00
turboderp
a3eabecf39
Logging: Add TABBY_LOG_CONSOLE_WIDTH to enable wider console log
2026-03-27 01:30:13 +01:00
turboderp
40aa82da28
API: More robust test for whether generation starts in reasoning mode
2026-03-27 01:29:17 +01:00
turboderp
ffca853d4c
ExLlamaV3: Force minimum rep_decay of 1 token, pending update to backend
2026-03-22 14:51:08 +01:00
turboderp
92cb48c38d
ExLlamaV3: Fix regression in max_seq_len limit
2026-03-22 00:34:47 +01:00
turboderp
0d1a8ba784
API: Try to guess whether streaming response should start with content or reasoning_content
2026-03-21 01:11:01 +01:00
turboderp
803ca5c681
Tree: Format
2026-03-20 20:56:43 +01:00
turboderp
088e196cbc
ExLlamaV3: Change cache size fallback value to max_seq_len, add warning to configure manually
2026-03-20 20:42:14 +01:00
turboderp
8b1bfeaba7
Model: Make sure reasoning tokens are always defined
2026-03-20 20:41:44 +01:00
turboderp
78c5993c27
ExLlamaV3: Correctly report when vision is supported but not enabled
2026-03-20 01:33:38 +01:00
turboderp
0d577b8121
Cleanup and formatting
2026-03-20 01:27:29 +01:00
turboderp
6bccc70d94
Tree: Formatting
2026-03-18 03:29:15 +01:00
turboderp
53357047ef
Delete redundant test script
2026-03-18 00:24:49 +01:00
turboderp
d2117a7c3b
Config: Pass reasoning settings in kwargs, allow for overrides via tabby_config.yml
2026-03-18 00:24:22 +01:00
turboderp
8eb6c65008
Merge branch 'main' into fork/Orion-zhen/feat_reasoning
...
# Conflicts:
# config_sample.yml
2026-03-17 23:05:19 +01:00
turboderp
ccd171cefb
Dependencies: Update exllamav3
2026-03-17 03:01:22 +01:00
turboderp
c2452414e1
Model: Ignore inline load requests if the requested model is already loaded
2026-03-17 03:00:27 +01:00
turboderp
6bf3670372
Model: Correctly read max_position_embeddings in nested config
...
Rework how max_seq_len is determined from user settings, model defaults and cache size constraint
2026-03-17 02:58:47 +01:00
turboderp
724060b058
Dependencies: Update exllamav3
2026-03-13 23:14:09 +01:00
turboderp
761e26a137
Dependencies: Update exllamav3
2026-03-05 18:09:34 +01:00
turboderp
41511f56c6
Dependencies: Update exllamav3
2026-02-09 22:54:29 +01:00
turboderp
54e3ea1fb3
Tree: Format
2026-01-20 22:57:36 +01:00
turboderp
0985c7f7b7
Sampling: Add adaptive-P params
2026-01-20 19:09:54 +01:00
turboderp
8a824cb127
Dependencies: Update exllamav3
2026-01-20 18:52:44 +01:00
kingbri
84bb1ce9fd
Dependencies: Fix FA2 wheels
...
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
2025-12-19 16:52:05 -05:00
kingbri
5627f4d69e
Dependencies: Update to torch 2.9
...
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
2025-12-19 15:59:40 -05:00
turboderp
f04fc6eb25
Dependencies: Update exllamav3
2025-12-16 12:58:31 +01:00
Brian
55288e5a1f
Merge pull request #402 from AlpinDale/auto-select-gpu
...
[startup] auto-select GPU backend
2025-12-08 22:04:26 -05:00
AlpinDale
76ffc7c458
[startup] auto-select GPU backend
2025-12-08 23:52:02 +00:00
turboderp
8b6b793bfc
Dependencies: Update exllamav3
2025-11-25 21:17:31 +01:00
Brian
685aca5a7d
Merge pull request #397 from beep39/json-schema-for-exllamav3
...
Constrained generation with json schema for ExllamaV3
2025-11-24 22:34:31 -05:00
kingbri
126759034e
Tree: Format
...
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
2025-11-24 22:32:19 -05:00
turboderp
f50015af5e
Dependencies: Update exllamav3
2025-11-23 23:27:26 +01:00
Brian
df724fdc78
Merge pull request #393 from mefich/main
...
Unloading vision model of VLMs for Exllamav3 backend
2025-11-19 22:46:59 -05:00
beep39
d53ca1345a
Constrained generation with json schema for ExllamaV3
2025-11-18 02:01:31 +09:00
turboderp
fece4791ad
exllamav2: Make sure cache size is set in unpaged mode
2025-11-06 21:03:24 +01:00
turboderp
368e87eb7d
Fix exllamav3 URL
2025-11-03 12:35:13 +01:00
turboderp
c6bf59063d
Dependencies: Update exllamav3
2025-11-02 23:45:34 +01:00
mefich
37aea9de83
Update exl3 backend model.py: fix for unloading vision models
...
This change ensures that when unloading vlm their vision part is also unloaded.
2025-10-30 12:30:23 +05:00
turboderp
996bc8dbe1
Dependencies: Update exllamav3
2025-10-17 23:41:44 +02:00
turboderp
2539acf800
Dependencies: Update exllamav3
2025-10-15 16:01:57 +02:00
turboderp
486dd0418e
Formatting
2025-10-15 10:47:58 +02:00
turboderp
0af29d957a
Fix #390
2025-10-15 10:40:19 +02:00
kingbri
ad64942fa1
Tree: Format
...
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
2025-10-14 23:49:13 -04:00
kingbri
f205349c81
Config: Fix use_as_default application
...
Apply the default overrides after inline config has been merged.
Do not require an inline config to apply use_as_default and other
overrides.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
2025-10-14 23:45:39 -04:00
kingbri
6f73a0b388
Tree: Format
...
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
2025-10-14 23:06:20 -04:00
kingbri
5cb8f3ed2c
Config: Fix comments for max_seq_len and cache_size
...
The default is the minimum between max_position_embeddings and cache_size.
On AMD and older than Ampere NVIDIA GPUs, cache_size is ignored due
to not being supported by batching on exl2.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
2025-10-14 23:04:36 -04:00
kingbri
fdb86f4c63
ExllamaV2: Add max_seq_len empty case like ExllamaV3
...
Also remove the intermediate base_seq_len and target_seq_len variables
to make code clearer.
If paged mode is off, max_seq_len becomes the prime mover since batching
is unavailable.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
2025-10-14 23:02:52 -04:00
kingbri
69a25d7fa6
Config + Endpoints: Make cache_size more prominent
...
Since cache_size is a more important parameter now for multi-user
setups, mark it as such by placing it below max_seq_len.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
2025-10-14 21:53:33 -04:00
kingbri
62e9fa217a
ExllamaV3: Handle max_seq_len defined and cache_size undefined case
...
The previous changes broke existing configs and max_seq_len was
force-overriden to 4096. This helps single-user setups since they
do not really benefit from the split cache_size max_seq_len mechanism
(except if batching).
cache_size is still the prime mover in exl3 due to its paging mechanism.
Ideally, for multi-user setups, cache_size should take as much VRAM
as possible and max_seq_len should be limited.
Breakdown:
cache_size and max_seq_len specified -> values
only cache_size/max_seq_len specified -> max_seq_len = cache_size and vice versa
neither specified -> cache_size = 4096, max_seq_len = min(max_position_embeddings, cache_size)
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
2025-10-14 21:48:36 -04:00