Commit Graph

  • 3cf468c283 Actions: Fix docker buildx casing issue main kingbri 2026-06-26 21:21:05 -04:00
  • 7e4ccd5e8c Actions: Point to GHCR cache instead of GHA cache kingbri 2026-06-26 21:13:46 -04:00
  • a3bd248e08 Actions: Use GHCR as Docker layer cache kingbri 2026-06-26 20:50:30 -04:00
  • 79126f904c API: Fix v1/token/encode endpoint after regression suspicious-pineapple 2026-06-27 02:40:35 +02:00
  • 538654bfcb Add tests turboderp 2026-06-27 02:37:30 +02:00
  • 9b6ffdc3b2 exllamav3: Expose loop detect option, enable 800-token window by default turboderp 2026-06-16 23:30:16 +02:00
  • d2d87bb9e0 API: Fix error message/code when context length exceeded turboderp 2026-06-15 20:36:25 +02:00
  • c1655d1234 Dependencies: Update exllamav3 turboderp 2026-06-14 20:07:08 +02:00
  • 202fbdc6d2 Merge remote-tracking branch 'origin/main' turboderp 2026-06-14 16:16:07 +02:00
  • afec2e354f Fix git error, missing file turboderp 2026-06-14 16:15:54 +02:00
  • 5485088231 add cu13 build (#423) lavd 2026-06-12 19:47:33 -04:00
  • 671c12d78c API: Reject oversized prompts with error code 400 before committing to EventSourceResponse turboderp 2026-06-13 01:34:24 +02:00
  • ddd2c409ad Logging: Add draft metrics turboderp 2026-06-13 00:29:53 +02:00
  • 26102e0251 Dependencies: Update exllamav3 turboderp 2026-06-13 00:13:34 +02:00
  • 004e837412 exllamav3: Add draft_mode option, support MTP and n-gram drafting turboderp 2026-06-13 00:12:24 +02:00
  • 95d1278694 Model: Fix regression when no draft_gpu_split specified turboderp 2026-06-12 23:31:34 +02:00
  • 637b595bb6 Merge branch 'fork/baronrabban/fix/draft-model-gpu-split' turboderp 2026-06-12 21:13:39 +02:00
  • 9726fbf0a0 Dependencies: Update exllamav3 turboderp 2026-06-12 21:10:41 +02:00
  • 4c7249e98d Fix draft model ignoring draft_gpu_split on load baronrabban 2026-06-10 18:45:52 -04:00
  • 2e50555d37 Dependencies: Update exllamav3 turboderp 2026-06-06 23:15:43 +02:00
  • 624f2baebd Dependencies: Don't try to import exllamav2 turboderp 2026-06-05 01:58:47 +02:00
  • 8822b886ea Tools: Add Step 3.7 tool format alias (qwen3_coder compatible) turboderp 2026-06-02 17:21:53 +02:00
  • 210bbe78f5 exllamav3: Include stop conditions from backend tokenizer turboderp 2026-06-02 15:28:56 +02:00
  • ff4160051f Dependencies: Update exllamav3 turboderp 2026-06-01 03:47:31 +02:00
  • 7a23e48fc1 Dependencies: Enable flash-linear-attention on Windows turboderp 2026-05-31 20:50:22 +02:00
  • 95c1101bd2 Dependencies: Update exllamav3 turboderp 2026-05-29 22:17:24 +02:00
  • 510367d1ab Logging: Add comprehensive request logging option turboderp 2026-05-27 00:33:45 +02:00
  • dd792e1916 Dependencies: Update exllamav3 turboderp 2026-05-24 20:33:33 +02:00
  • 20cd52371a Docker: Update compose service turboderp 2026-05-24 20:33:03 +02:00
  • fef811d484 Dependencies: Add cu13 install option and Dockerfile (exllamav3 only) turboderp 2026-05-23 01:21:18 +02:00
  • 539289375c Dependencies: Add flash-linear-attention turboderp 2026-05-23 01:20:18 +02:00
  • ed97bbb2af Model: Add draft_num_tokens config option, update model container to forward draft and bsz args to backend turboderp 2026-05-23 00:40:42 +02:00
  • a430dce6f3 Config: Fix incorrect description of gpu_split as integer list turboderp 2026-05-22 23:30:37 +02:00
  • 2593fb79a2 Merge remote-tracking branch 'origin/main' turboderp 2026-05-14 12:56:51 +02:00
  • 857f9e21dd Merge pull request #422 turboderp 2026-05-14 12:56:40 +02:00
  • 52bc74b3f9 Start.py: improve dependency installation check and cleanup uv logging Optimal 2026-05-14 01:56:30 +09:00
  • 4de923d8b3 Add docker instructions to README.md turboderp 2026-05-10 11:26:53 +02:00
  • 838df5a3c7 Docker: Remove version from example docker-compose.yml turboderp 2026-05-10 11:25:15 +02:00
  • 64ad702416 Dependencies: Pin pydantic again (>2.11 breaks docker image) turboderp 2026-05-10 01:41:03 +02:00
  • 5818311d06 Dependencies: Pin correct xformers version torch 2.9 turboderp 2026-05-10 01:21:37 +02:00
  • 553c4e7cbb Docker: Serve on 0.0.0.0 by default turboderp 2026-05-09 23:22:56 +02:00
  • 5d964494b6 Merge remote-tracking branch 'origin/main' turboderp 2026-05-09 23:18:52 +02:00
  • 4a8cb08a24 Dependencies: Include triton and xformers turboderp 2026-05-09 23:14:30 +02:00
  • fd9591133d Dependencies: Update exllamav3, unpin pydantic turboderp 2026-05-09 23:01:07 +02:00
  • 54c1e56019 Update config_sample.yml (#418) RodriMora 2026-05-09 20:21:57 +01:00
  • 09f36f9c05 fix: prevent xformers from pulling cu130 wheels on cu128 hosts (#420) Josh 2026-05-09 12:21:17 -07:00
  • bc5de12c82 Dependencies: Fix Windows FA2 wheel URL for cp312 turboderp 2026-05-05 10:02:49 +02:00
  • 59494106c9 Dependencies: Update exllamav3 turboderp 2026-05-03 00:01:59 +02:00
  • 51b67595f4 Dependencies: Switch to mjun0812 flash-attn wheels turboderp 2026-05-03 00:01:29 +02:00
  • 6e97aa5fc1 Model: Fix model loading progress display when draft enabled turboderp 2026-05-02 20:30:38 +02:00
  • c06a6fbf7f API: Accept JSON schema in request.response_format.json_schema, delay JSON filter until start of content block turboderp 2026-05-02 20:29:59 +02:00
  • d0103c19a7 Dependencies: Bump exllamav3 turboderp 2026-04-29 00:55:59 +02:00
  • e909f7ecdb ExLlamaV3: Respect device split when loading draft model turboderp 2026-04-25 01:51:46 +02:00
  • 6aa842a1b2 Dependencies: Update exllamav3 turboderp 2026-04-20 23:11:30 +02:00
  • 3e3d7ccd54 Tools: Add step3_5 alias (qwen3_coder tool format) turboderp 2026-04-18 18:23:00 +02:00
  • ed41c51909 API: Prevent race condition when multiple chat requests try to inline-load the same model turboderp 2026-04-18 18:18:42 +02:00
  • 5b2b707af9 exllamav3: Account for bsz=2 in autosplit turboderp 2026-04-18 17:07:34 +02:00
  • 9ebbe06f29 exllamav3: Supply max_chunk_size when loading model turboderp 2026-04-18 13:20:12 +02:00
  • f74f16a5c2 Config: Make recurrent cache size configurable turboderp 2026-04-17 02:40:22 +02:00
  • bd589272cc Config: Make cuda_malloc_async configurable again, change import order to make sure config is loaded before torch is imported turboderp 2026-04-17 02:38:27 +02:00
  • 32eed618dc Dependencies: Add requests turboderp 2026-04-12 13:51:54 +02:00
  • 1a4896ce66 Tree: Format turboderp 2026-04-12 13:47:05 +02:00
  • 510bf7bf6c Update README.md turboderp 2026-04-12 13:44:26 +02:00
  • f1a2416da5 OAI endpoints: Add option to suppress header after reasoning start token (e.g. Gemma4's "thought\n") main_tools turboderp 2026-04-12 04:12:53 +02:00
  • 2636b445f0 Tree: Format turboderp 2026-04-12 03:33:14 +02:00
  • bb64f8f18e Dependencies: Update exllamav3 turboderp 2026-04-12 03:31:54 +02:00
  • 3a42c1756c ExLlamaV2: Use new disconnect handler turboderp 2026-04-10 22:04:21 +02:00
  • b21100f971 ExLlamaV3: Fix disconnected request handling regression turboderp 2026-04-10 22:03:19 +02:00
  • 08f92167de Tools: Updated/fixed Gemma4 tool parser mindkrypted 2026-04-10 22:02:34 +02:00
  • 5517cb5b9e Templates: Revert add_bos_token fix turboderp 2026-04-10 03:53:58 +02:00
  • 7fedc179f0 Templates: Make sure add_bos_token=False is respected turboderp 2026-04-10 03:14:29 +02:00
  • 27d29209c6 Tools: Add Gemma4 parser turboderp 2026-04-10 00:16:58 +02:00
  • 55124d0fc6 Config: Add force_enable_thinking turboderp 2026-04-10 00:16:40 +02:00
  • db9048e59b Docs: Tool calling turboderp 2026-04-08 19:39:42 +02:00
  • 79d581e1f5 OAI endpoints: More rework turboderp 2026-04-02 01:26:44 +02:00
  • c315f6b73e OAI endpoints: Correctly propagate exceptions in non-streaming mode turboderp 2026-04-01 12:27:07 +02:00
  • 455c09932f OAI endpoints: Fix regression for non-reasoning models turboderp 2026-04-01 00:08:39 +02:00
  • 0409064028 Tools: Refactor and further simplify tool parsing turboderp 2026-04-01 00:07:44 +02:00
  • b6428b1676 Seq: Allow longer strings in log turboderp 2026-03-31 18:18:07 +02:00
  • 112ab69002 Fix comments turboderp 2026-03-31 14:43:55 +02:00
  • bc66ba4b8b Merge branch 'main' into main_tools turboderp 2026-03-30 23:07:53 +02:00
  • c887ae88fc Dependencies: Update exllamav3 turboderp 2026-03-30 23:07:00 +02:00
  • a7c7934ec3 Tool parsing: Include outer <tool_call> tags in raw text sent to parser turboderp 2026-03-30 04:05:15 +02:00
  • 41ed1e4881 Seq: Sanitize extra log data turboderp 2026-03-30 03:36:30 +02:00
  • 02a700e065 ExLlamaV3: Limit MMEmbedding cache size turboderp 2026-03-30 03:35:46 +02:00
  • ba4309b948 ExLlamaV3: Replace MMEmbedding lru_cache with dict to avoid storing arbitrarily large uuencoded images as keys turboderp 2026-03-30 02:55:21 +02:00
  • a035bc9e94 Model: Fix regression turboderp 2026-03-30 02:37:27 +02:00
  • 9ee5ded218 OAI: Log raw requests turboderp 2026-03-30 01:23:16 +02:00
  • 357eebffd2 Logger: Fix invalid escape sequence (gave syntax warning) turboderp 2026-03-30 00:33:01 +02:00
  • 9f565562dd Add inference test scripts turboderp 2026-03-30 00:23:25 +02:00
  • 179479199b Rework tool calls and OAI chat completions turboderp 2026-03-30 00:17:47 +02:00
  • aa54098f26 Ruff: Format (line length) turboderp 2026-03-29 23:53:58 +02:00
  • 2a1503b283 Logging: Use debug level for Seq instead of verbose turboderp 2026-03-29 18:51:34 +02:00
  • 47d08729ed Ruff: Raise line length limit to 100 turboderp 2026-03-28 19:49:17 +01:00
  • 4b3c74782d Fix bad merge turboderp 2026-03-28 12:47:26 +01:00
  • b4dfd2e86f Fix logging turboderp 2026-03-28 01:13:23 +01:00
  • 56378b946d Merge branch 'fork/devnen/full-tool-calling-support' into main_seqlog turboderp 2026-03-28 01:06:54 +01:00
  • f3787de6a6 Ruff: Format turboderp 2026-03-27 21:47:24 +01:00
  • 83127ab4f8 Logging: Log messages via Seq wrapper turboderp 2026-03-27 21:38:24 +01:00
  • c32a628917 Logging: Add Seq wrapper turboderp 2026-03-27 21:08:26 +01:00