turboderp
41ed1e4881
Seq: Sanitize extra log data
2026-03-30 03:36:30 +02:00
turboderp
02a700e065
ExLlamaV3: Limit MMEmbedding cache size
2026-03-30 03:35:46 +02:00
turboderp
ba4309b948
ExLlamaV3: Replace MMEmbedding lru_cache with dict to avoid storing arbitrarily large uuencoded images as keys
2026-03-30 02:55:21 +02:00
turboderp
a035bc9e94
Model: Fix regression
2026-03-30 02:37:27 +02:00
turboderp
9ee5ded218
OAI: Log raw requests
2026-03-30 01:23:16 +02:00
turboderp
357eebffd2
Logger: Fix invalid escape sequence (gave syntax warning)
2026-03-30 00:33:01 +02:00
turboderp
9f565562dd
Add inference test scripts
2026-03-30 00:23:25 +02:00
turboderp
179479199b
Rework tool calls and OAI chat completions
...
- move tool config from template_vars to separate yml config
- new per-gen stream collector used for both streaming and non-streaming requests to ensure logic is consistent for both
- move responsibility for switching between phases to stream collector
- collect tool calls during streaming and parse at the end of each gen
- prevent streaming empty content spans (be nice to clients)
- correctly aggregate usage stats for n>1 requests, always emit with last chunk in last gen to finish
- collect logprobs in model wrapper and correctly handle logprobs for multi-token chars etc.
- respect top_logprobs argument in request
- handle a number of edge cases like <think> tag being part of held string, etc.
- retain tool parsing and inference-abort fixes from #413 , apply similar fix to non-stream request as well
Still TODO:
- testing and validation with more models and tool schemas (tested on Qwen so far)
- enable JSON constraint for JSON tool models
- possibly some pydantification
- documentation
2026-03-30 00:22:55 +02:00
turboderp
aa54098f26
Ruff: Format (line length)
2026-03-30 00:19:07 +02:00
turboderp
2a1503b283
Logging: Use debug level for Seq instead of verbose
2026-03-29 18:51:57 +02:00
turboderp
47d08729ed
Ruff: Raise line length limit to 100
2026-03-28 19:49:17 +01:00
turboderp
4b3c74782d
Fix bad merge
2026-03-28 12:47:26 +01:00
turboderp
b4dfd2e86f
Fix logging
2026-03-28 01:13:23 +01:00
turboderp
56378b946d
Merge branch 'fork/devnen/full-tool-calling-support' into main_seqlog
...
# Conflicts:
# common/templating.py
# endpoints/OAI/utils/chat_completion.py
# endpoints/OAI/utils/tools.py
2026-03-28 01:06:54 +01:00
turboderp
f3787de6a6
Ruff: Format
2026-03-27 21:47:24 +01:00
turboderp
83127ab4f8
Logging: Log messages via Seq wrapper
2026-03-27 21:38:47 +01:00
turboderp
c32a628917
Logging: Add Seq wrapper
2026-03-27 21:38:47 +01:00
turboderp
1a7191702d
Dependencies: Update exllamav3
2026-03-27 02:54:42 +01:00
turboderp
da3d3338e8
Logging: Fix env var parsing, formatting
2026-03-27 02:31:36 +01:00
turboderp
a3eabecf39
Logging: Add TABBY_LOG_CONSOLE_WIDTH to enable wider console log
2026-03-27 01:30:13 +01:00
turboderp
40aa82da28
API: More robust test for whether generation starts in reasoning mode
2026-03-27 01:29:17 +01:00
turboderp
ffca853d4c
ExLlamaV3: Force minimum rep_decay of 1 token, pending update to backend
2026-03-22 14:51:08 +01:00
turboderp
92cb48c38d
ExLlamaV3: Fix regression in max_seq_len limit
2026-03-22 00:34:47 +01:00
turboderp
0d1a8ba784
API: Try to guess whether streaming response should start with content or reasoning_content
2026-03-21 01:11:01 +01:00
turboderp
803ca5c681
Tree: Format
2026-03-20 20:56:43 +01:00
turboderp
088e196cbc
ExLlamaV3: Change cache size fallback value to max_seq_len, add warning to configure manually
2026-03-20 20:42:14 +01:00
turboderp
8b1bfeaba7
Model: Make sure reasoning tokens are always defined
2026-03-20 20:41:44 +01:00
turboderp
78c5993c27
ExLlamaV3: Correctly report when vision is supported but not enabled
2026-03-20 01:33:38 +01:00
turboderp
0d577b8121
Cleanup and formatting
2026-03-20 01:27:29 +01:00
turboderp
6bccc70d94
Tree: Formatting
2026-03-18 03:29:15 +01:00
turboderp
53357047ef
Delete redundant test script
2026-03-18 00:24:49 +01:00
turboderp
d2117a7c3b
Config: Pass reasoning settings in kwargs, allow for overrides via tabby_config.yml
2026-03-18 00:24:22 +01:00
turboderp
8eb6c65008
Merge branch 'main' into fork/Orion-zhen/feat_reasoning
...
# Conflicts:
# config_sample.yml
2026-03-17 23:05:19 +01:00
turboderp
ccd171cefb
Dependencies: Update exllamav3
2026-03-17 03:01:22 +01:00
turboderp
c2452414e1
Model: Ignore inline load requests if the requested model is already loaded
2026-03-17 03:00:27 +01:00
turboderp
6bf3670372
Model: Correctly read max_position_embeddings in nested config
...
Rework how max_seq_len is determined from user settings, model defaults and cache size constraint
2026-03-17 02:58:47 +01:00
turboderp
724060b058
Dependencies: Update exllamav3
2026-03-13 23:14:09 +01:00
turboderp
761e26a137
Dependencies: Update exllamav3
2026-03-05 18:09:34 +01:00
devnen
a2c7d81686
Broader model compatibility, tool_choice support, bug fixes and cleanup
2026-02-14 16:19:59 +01:00
devnen
87bbe0fac2
Full tool-calling support: XML parsing, streaming compliance, Pydantic fix, inference abort fix
2026-02-14 14:26:57 +01:00
turboderp
41511f56c6
Dependencies: Update exllamav3
2026-02-09 22:54:29 +01:00
turboderp
54e3ea1fb3
Tree: Format
2026-01-20 22:57:36 +01:00
turboderp
0985c7f7b7
Sampling: Add adaptive-P params
2026-01-20 19:09:54 +01:00
turboderp
8a824cb127
Dependencies: Update exllamav3
2026-01-20 18:52:44 +01:00
kingbri
84bb1ce9fd
Dependencies: Fix FA2 wheels
...
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
2025-12-19 16:52:05 -05:00
kingbri
5627f4d69e
Dependencies: Update to torch 2.9
...
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
2025-12-19 15:59:40 -05:00
turboderp
f04fc6eb25
Dependencies: Update exllamav3
2025-12-16 12:58:31 +01:00
Brian
55288e5a1f
Merge pull request #402 from AlpinDale/auto-select-gpu
...
[startup] auto-select GPU backend
2025-12-08 22:04:26 -05:00
AlpinDale
76ffc7c458
[startup] auto-select GPU backend
2025-12-08 23:52:02 +00:00
turboderp
8b6b793bfc
Dependencies: Update exllamav3
2025-11-25 21:17:31 +01:00