Commit Graph

  • 64ad702416 Dependencies: Pin pydantic again (>2.11 breaks docker image) main turboderp 2026-05-10 01:41:03 +02:00
  • 5818311d06 Dependencies: Pin correct xformers version torch 2.9 turboderp 2026-05-10 01:21:37 +02:00
  • 553c4e7cbb Docker: Serve on 0.0.0.0 by default turboderp 2026-05-09 23:22:56 +02:00
  • 5d964494b6 Merge remote-tracking branch 'origin/main' turboderp 2026-05-09 23:18:52 +02:00
  • 4a8cb08a24 Dependencies: Include triton and xformers turboderp 2026-05-09 23:14:30 +02:00
  • fd9591133d Dependencies: Update exllamav3, unpin pydantic turboderp 2026-05-09 23:01:07 +02:00
  • 54c1e56019 Update config_sample.yml (#418) RodriMora 2026-05-09 20:21:57 +01:00
  • 09f36f9c05 fix: prevent xformers from pulling cu130 wheels on cu128 hosts (#420) Josh 2026-05-09 12:21:17 -07:00
  • bc5de12c82 Dependencies: Fix Windows FA2 wheel URL for cp312 turboderp 2026-05-05 10:02:49 +02:00
  • 59494106c9 Dependencies: Update exllamav3 turboderp 2026-05-03 00:01:59 +02:00
  • 51b67595f4 Dependencies: Switch to mjun0812 flash-attn wheels turboderp 2026-05-03 00:01:29 +02:00
  • 6e97aa5fc1 Model: Fix model loading progress display when draft enabled turboderp 2026-05-02 20:30:38 +02:00
  • c06a6fbf7f API: Accept JSON schema in request.response_format.json_schema, delay JSON filter until start of content block turboderp 2026-05-02 20:29:59 +02:00
  • d0103c19a7 Dependencies: Bump exllamav3 turboderp 2026-04-29 00:55:59 +02:00
  • e909f7ecdb ExLlamaV3: Respect device split when loading draft model turboderp 2026-04-25 01:51:46 +02:00
  • 6aa842a1b2 Dependencies: Update exllamav3 turboderp 2026-04-20 23:11:30 +02:00
  • 3e3d7ccd54 Tools: Add step3_5 alias (qwen3_coder tool format) turboderp 2026-04-18 18:23:00 +02:00
  • ed41c51909 API: Prevent race condition when multiple chat requests try to inline-load the same model turboderp 2026-04-18 18:18:42 +02:00
  • 5b2b707af9 exllamav3: Account for bsz=2 in autosplit turboderp 2026-04-18 17:07:34 +02:00
  • 9ebbe06f29 exllamav3: Supply max_chunk_size when loading model turboderp 2026-04-18 13:20:12 +02:00
  • f74f16a5c2 Config: Make recurrent cache size configurable turboderp 2026-04-17 02:40:22 +02:00
  • bd589272cc Config: Make cuda_malloc_async configurable again, change import order to make sure config is loaded before torch is imported turboderp 2026-04-17 02:38:27 +02:00
  • 32eed618dc Dependencies: Add requests turboderp 2026-04-12 13:51:54 +02:00
  • 1a4896ce66 Tree: Format turboderp 2026-04-12 13:47:05 +02:00
  • 510bf7bf6c Update README.md turboderp 2026-04-12 13:44:26 +02:00
  • f1a2416da5 OAI endpoints: Add option to suppress header after reasoning start token (e.g. Gemma4's "thought\n") main_tools turboderp 2026-04-12 04:12:53 +02:00
  • 2636b445f0 Tree: Format turboderp 2026-04-12 03:33:14 +02:00
  • bb64f8f18e Dependencies: Update exllamav3 turboderp 2026-04-12 03:31:54 +02:00
  • 3a42c1756c ExLlamaV2: Use new disconnect handler turboderp 2026-04-10 22:04:21 +02:00
  • b21100f971 ExLlamaV3: Fix disconnected request handling regression turboderp 2026-04-10 22:03:19 +02:00
  • 08f92167de Tools: Updated/fixed Gemma4 tool parser mindkrypted 2026-04-10 22:02:34 +02:00
  • 5517cb5b9e Templates: Revert add_bos_token fix turboderp 2026-04-10 03:53:58 +02:00
  • 7fedc179f0 Templates: Make sure add_bos_token=False is respected turboderp 2026-04-10 03:14:29 +02:00
  • 27d29209c6 Tools: Add Gemma4 parser turboderp 2026-04-10 00:16:58 +02:00
  • 55124d0fc6 Config: Add force_enable_thinking turboderp 2026-04-10 00:16:40 +02:00
  • db9048e59b Docs: Tool calling turboderp 2026-04-08 19:39:42 +02:00
  • 79d581e1f5 OAI endpoints: More rework turboderp 2026-04-02 01:26:44 +02:00
  • c315f6b73e OAI endpoints: Correctly propagate exceptions in non-streaming mode turboderp 2026-04-01 12:27:07 +02:00
  • 455c09932f OAI endpoints: Fix regression for non-reasoning models turboderp 2026-04-01 00:08:39 +02:00
  • 0409064028 Tools: Refactor and further simplify tool parsing turboderp 2026-04-01 00:07:44 +02:00
  • b6428b1676 Seq: Allow longer strings in log turboderp 2026-03-31 18:18:07 +02:00
  • 112ab69002 Fix comments turboderp 2026-03-31 14:43:55 +02:00
  • bc66ba4b8b Merge branch 'main' into main_tools turboderp 2026-03-30 23:07:53 +02:00
  • c887ae88fc Dependencies: Update exllamav3 turboderp 2026-03-30 23:07:00 +02:00
  • a7c7934ec3 Tool parsing: Include outer <tool_call> tags in raw text sent to parser turboderp 2026-03-30 04:05:15 +02:00
  • 41ed1e4881 Seq: Sanitize extra log data turboderp 2026-03-30 03:36:30 +02:00
  • 02a700e065 ExLlamaV3: Limit MMEmbedding cache size turboderp 2026-03-30 03:35:46 +02:00
  • ba4309b948 ExLlamaV3: Replace MMEmbedding lru_cache with dict to avoid storing arbitrarily large uuencoded images as keys turboderp 2026-03-30 02:55:21 +02:00
  • a035bc9e94 Model: Fix regression turboderp 2026-03-30 02:37:27 +02:00
  • 9ee5ded218 OAI: Log raw requests turboderp 2026-03-30 01:23:16 +02:00
  • 357eebffd2 Logger: Fix invalid escape sequence (gave syntax warning) turboderp 2026-03-30 00:33:01 +02:00
  • 9f565562dd Add inference test scripts turboderp 2026-03-30 00:23:25 +02:00
  • 179479199b Rework tool calls and OAI chat completions turboderp 2026-03-30 00:17:47 +02:00
  • aa54098f26 Ruff: Format (line length) turboderp 2026-03-29 23:53:58 +02:00
  • 2a1503b283 Logging: Use debug level for Seq instead of verbose turboderp 2026-03-29 18:51:34 +02:00
  • 47d08729ed Ruff: Raise line length limit to 100 turboderp 2026-03-28 19:49:17 +01:00
  • 4b3c74782d Fix bad merge turboderp 2026-03-28 12:47:26 +01:00
  • b4dfd2e86f Fix logging turboderp 2026-03-28 01:13:23 +01:00
  • 56378b946d Merge branch 'fork/devnen/full-tool-calling-support' into main_seqlog turboderp 2026-03-28 01:06:54 +01:00
  • f3787de6a6 Ruff: Format turboderp 2026-03-27 21:47:24 +01:00
  • 83127ab4f8 Logging: Log messages via Seq wrapper turboderp 2026-03-27 21:38:24 +01:00
  • c32a628917 Logging: Add Seq wrapper turboderp 2026-03-27 21:08:26 +01:00
  • 1a7191702d Dependencies: Update exllamav3 turboderp 2026-03-27 02:54:42 +01:00
  • da3d3338e8 Logging: Fix env var parsing, formatting turboderp 2026-03-27 02:31:36 +01:00
  • a3eabecf39 Logging: Add TABBY_LOG_CONSOLE_WIDTH to enable wider console log turboderp 2026-03-27 01:30:13 +01:00
  • 40aa82da28 API: More robust test for whether generation starts in reasoning mode turboderp 2026-03-27 01:29:17 +01:00
  • ffca853d4c ExLlamaV3: Force minimum rep_decay of 1 token, pending update to backend turboderp 2026-03-22 14:51:08 +01:00
  • 92cb48c38d ExLlamaV3: Fix regression in max_seq_len limit turboderp 2026-03-22 00:34:47 +01:00
  • 0d1a8ba784 API: Try to guess whether streaming response should start with content or reasoning_content turboderp 2026-03-21 01:11:01 +01:00
  • 803ca5c681 Tree: Format turboderp 2026-03-20 20:56:43 +01:00
  • 088e196cbc ExLlamaV3: Change cache size fallback value to max_seq_len, add warning to configure manually turboderp 2026-03-20 20:42:14 +01:00
  • 8b1bfeaba7 Model: Make sure reasoning tokens are always defined turboderp 2026-03-20 20:41:44 +01:00
  • 78c5993c27 ExLlamaV3: Correctly report when vision is supported but not enabled turboderp 2026-03-20 01:33:14 +01:00
  • 0d577b8121 Cleanup and formatting turboderp 2026-03-20 01:27:29 +01:00
  • 6bccc70d94 Tree: Formatting turboderp 2026-03-18 03:29:15 +01:00
  • 53357047ef Delete redundant test script turboderp 2026-03-18 00:24:49 +01:00
  • d2117a7c3b Config: Pass reasoning settings in kwargs, allow for overrides via tabby_config.yml turboderp 2026-03-18 00:24:22 +01:00
  • 8eb6c65008 Merge branch 'main' into fork/Orion-zhen/feat_reasoning turboderp 2026-03-17 23:05:19 +01:00
  • ccd171cefb Dependencies: Update exllamav3 turboderp 2026-03-17 03:01:22 +01:00
  • c2452414e1 Model: Ignore inline load requests if the requested model is already loaded turboderp 2026-03-17 03:00:27 +01:00
  • 6bf3670372 Model: Correctly read max_position_embeddings in nested config turboderp 2026-03-17 02:58:47 +01:00
  • 724060b058 Dependencies: Update exllamav3 turboderp 2026-03-13 23:14:09 +01:00
  • 761e26a137 Dependencies: Update exllamav3 turboderp 2026-03-05 18:09:34 +01:00
  • a2c7d81686 Broader model compatibility, tool_choice support, bug fixes and cleanup devnen 2026-02-14 16:15:02 +01:00
  • 87bbe0fac2 Full tool-calling support: XML parsing, streaming compliance, Pydantic fix, inference abort fix devnen 2026-02-14 14:26:57 +01:00
  • 41511f56c6 Dependencies: Update exllamav3 turboderp 2026-02-09 22:54:29 +01:00
  • 54e3ea1fb3 Tree: Format turboderp 2026-01-20 22:57:36 +01:00
  • 0985c7f7b7 Sampling: Add adaptive-P params turboderp 2026-01-20 19:09:54 +01:00
  • 8a824cb127 Dependencies: Update exllamav3 turboderp 2026-01-20 18:52:44 +01:00
  • 84bb1ce9fd Dependencies: Fix FA2 wheels kingbri 2025-12-19 16:52:05 -05:00
  • 5627f4d69e Dependencies: Update to torch 2.9 kingbri 2025-12-19 15:59:40 -05:00
  • f04fc6eb25 Dependencies: Update exllamav3 turboderp 2025-12-16 12:58:31 +01:00
  • 55288e5a1f Merge pull request #402 from AlpinDale/auto-select-gpu Brian 2025-12-08 22:04:26 -05:00
  • 76ffc7c458 [startup] auto-select GPU backend AlpinDale 2025-12-08 23:52:02 +00:00
  • 8b6b793bfc Dependencies: Update exllamav3 turboderp 2025-11-25 21:17:31 +01:00
  • 685aca5a7d Merge pull request #397 from beep39/json-schema-for-exllamav3 Brian 2025-11-24 22:34:31 -05:00
  • 126759034e Tree: Format kingbri 2025-11-24 22:32:19 -05:00
  • f50015af5e Dependencies: Update exllamav3 turboderp 2025-11-23 23:27:26 +01:00
  • df724fdc78 Merge pull request #393 from mefich/main Brian 2025-11-19 22:46:59 -05:00
  • d53ca1345a Constrained generation with json schema for ExllamaV3 beep39 2025-11-18 01:57:54 +09:00