Commit Graph

  • 724060b058 Dependencies: Update exllamav3 main turboderp 2026-03-13 23:14:09 +01:00
  • 761e26a137 Dependencies: Update exllamav3 turboderp 2026-03-05 18:09:34 +01:00
  • 41511f56c6 Dependencies: Update exllamav3 turboderp 2026-02-09 22:54:29 +01:00
  • 54e3ea1fb3 Tree: Format turboderp 2026-01-20 22:57:36 +01:00
  • 0985c7f7b7 Sampling: Add adaptive-P params turboderp 2026-01-20 19:09:54 +01:00
  • 8a824cb127 Dependencies: Update exllamav3 turboderp 2026-01-20 18:52:44 +01:00
  • 84bb1ce9fd Dependencies: Fix FA2 wheels kingbri 2025-12-19 16:52:05 -05:00
  • 5627f4d69e Dependencies: Update to torch 2.9 kingbri 2025-12-19 15:59:40 -05:00
  • f04fc6eb25 Dependencies: Update exllamav3 turboderp 2025-12-16 12:58:31 +01:00
  • 55288e5a1f Merge pull request #402 from AlpinDale/auto-select-gpu Brian 2025-12-08 22:04:26 -05:00
  • 76ffc7c458 [startup] auto-select GPU backend AlpinDale 2025-12-08 23:52:02 +00:00
  • 8b6b793bfc Dependencies: Update exllamav3 turboderp 2025-11-25 21:17:31 +01:00
  • 685aca5a7d Merge pull request #397 from beep39/json-schema-for-exllamav3 Brian 2025-11-24 22:34:31 -05:00
  • 126759034e Tree: Format kingbri 2025-11-24 22:32:19 -05:00
  • f50015af5e Dependencies: Update exllamav3 turboderp 2025-11-23 23:27:26 +01:00
  • df724fdc78 Merge pull request #393 from mefich/main Brian 2025-11-19 22:46:59 -05:00
  • d53ca1345a Constrained generation with json schema for ExllamaV3 beep39 2025-11-18 01:57:54 +09:00
  • fece4791ad exllamav2: Make sure cache size is set in unpaged mode turboderp 2025-11-06 21:03:24 +01:00
  • 368e87eb7d Fix exllamav3 URL turboderp 2025-11-03 12:35:13 +01:00
  • c6bf59063d Dependencies: Update exllamav3 turboderp 2025-11-02 23:45:34 +01:00
  • 37aea9de83 Update exl3 backend model.py: fix for unloading vision models mefich 2025-10-30 12:30:23 +05:00
  • 996bc8dbe1 Dependencies: Update exllamav3 turboderp 2025-10-17 23:41:44 +02:00
  • 2539acf800 Dependencies: Update exllamav3 turboderp 2025-10-15 16:01:57 +02:00
  • 486dd0418e Formatting turboderp 2025-10-15 10:47:58 +02:00
  • 0af29d957a Fix #390 turboderp 2025-10-15 10:40:19 +02:00
  • ad64942fa1 Tree: Format kingbri 2025-10-14 23:49:13 -04:00
  • f205349c81 Config: Fix use_as_default application kingbri 2025-10-14 23:35:39 -04:00
  • 6f73a0b388 Tree: Format kingbri 2025-10-14 23:06:20 -04:00
  • 5cb8f3ed2c Config: Fix comments for max_seq_len and cache_size kingbri 2025-10-14 23:04:36 -04:00
  • fdb86f4c63 ExllamaV2: Add max_seq_len empty case like ExllamaV3 kingbri 2025-10-14 23:02:52 -04:00
  • 69a25d7fa6 Config + Endpoints: Make cache_size more prominent kingbri 2025-10-14 21:53:33 -04:00
  • 62e9fa217a ExllamaV3: Handle max_seq_len defined and cache_size undefined case kingbri 2025-10-14 21:48:36 -04:00
  • 04ca346732 Fix formatting turboderp 2025-10-14 03:11:59 +02:00
  • ec50ad17ea Merge branch 'main_seq' turboderp 2025-10-14 02:58:00 +02:00
  • 8abdfe7b13 Config: replace disable_output_chunking flag with output_chunking main_seq turboderp 2025-10-14 02:47:52 +02:00
  • 7eee3924c7 Merge remote-tracking branch 'origin/main_seq' into main_seq turboderp 2025-10-14 00:58:42 +02:00
  • f73e88e9e9 Dependencies: update exllamav3 turboderp 2025-10-14 00:58:14 +02:00
  • 85459ce600 Tree: Format kingbri 2025-10-09 22:33:53 -04:00
  • 01a5915a7b Dependencies: Pin Pydantic to version 2.11.0 turboderp 2025-10-08 20:43:26 +02:00
  • 4235f98e83 Model: Change cache_size/max_seq_len behavior turboderp 2025-10-05 22:15:27 +02:00
  • d672dc2137 API: Fix race condition when client disconnects turboderp 2025-10-05 21:23:02 +02:00
  • 52e093ae6c Model: Enable max_rq_tokens (output chunking) turboderp 2025-10-05 18:54:45 +02:00
  • e09a61969f Model: Fix NCCL detection turboderp 2025-10-05 18:52:37 +02:00
  • 7a0dddcbd9 Dependencies: Update exllamav3 kingbri 2025-09-30 17:33:23 -04:00
  • 1d3a308709 Fix wiki link in README.md turboderp 2025-08-26 13:03:18 +02:00
  • d7eb580e99 Start: Fix uv check kingbri 2025-08-21 18:23:42 -04:00
  • 4036c70d75 Tree: Format kingbri 2025-08-19 22:58:25 -04:00
  • bd3aa5bb04 Docs: Add uv section kingbri 2025-08-19 22:54:27 -04:00
  • 1f4186512e Start: Add check for uv kingbri 2025-08-19 22:49:13 -04:00
  • 30a3cd75cf Start: Migrate options from cu121/118 to cu12 kingbri 2025-08-19 22:25:30 -04:00
  • 1344726936 Docs: Sampler overrides part 2 kingbri 2025-08-19 21:19:12 -04:00
  • 86f27c9c93 Merge pull request #377 from DocShotgun/main Brian 2025-08-18 23:12:34 -04:00
  • e07df3951e Docs: Update sampler overrides kingbri 2025-08-18 23:06:16 -04:00
  • 067d63773e Config: Move sampling higher in the list kingbri 2025-08-18 22:55:03 -04:00
  • 6fb0c2cdbd Config: Update description for override_preset default * We provide safe_defaults as a default in config_sample.yml but not internally DocShotgun 2025-08-18 12:39:52 -07:00
  • 998abe5ad1 Config: Enable safe sampler overrides by default * Provides safe fallback samplers, intended for better out-of-the-box support for clients that do not pass sampler params DocShotgun 2025-08-18 12:32:28 -07:00
  • a4d02c2b70 Model: Add log messages for model loading kingbri 2025-08-17 23:09:27 -04:00
  • a3a32c30a4 Model: Add utils file kingbri 2025-08-17 22:43:19 -04:00
  • 05791a25a1 Merge pull request #375 from Ph0rk0z/patch-1 Brian 2025-08-17 22:37:25 -04:00
  • 43f9483bc4 Model: Add tensor_parallel_backend option kingbri 2025-08-17 21:42:30 -04:00
  • b9952f319e Merge branch 'main' into exl3-tp kingbri 2025-08-17 21:21:40 -04:00
  • f2a39e3a61 Dependencies: Update exllama, torch, and flash attention kingbri 2025-08-17 21:19:23 -04:00
  • 60ae419746 Model.py TP changes Forkoz 2025-08-12 21:01:54 +00:00
  • 6623dbcd86 Merge pull request #373 from AUTOMATIC1111/exl3-logprobs Brian 2025-08-05 01:24:06 -04:00
  • fe149489af Tree: Format kingbri 2025-08-05 01:22:18 -04:00
  • 83f778db2d Merge pull request #374 from DocShotgun/main Brian 2025-08-05 01:18:25 -04:00
  • 81a115b781 Templating: Support chat_template.jinja DocShotgun 2025-08-03 16:10:08 -07:00
  • 056527ceb3 add logprobs support for exl3 AUTOMATIC 2025-08-03 11:42:32 +03:00
  • 03d72a37be Merge pull request #371 from DocShotgun/main Brian 2025-08-01 14:02:57 -04:00
  • 102af306e5 Config: Remove developer arg cuda_malloc_backend * cudaMallocAsync is now enabled by default on supported configurations DocShotgun 2025-08-01 10:59:13 -07:00
  • 113643c0df Main: Enable cudaMallocAsync backend by default kingbri 2025-07-27 22:29:46 -04:00
  • 0b4ca567f8 API: Persist request IDs and append full_text to finish chunk kingbri 2025-07-24 22:28:40 -04:00
  • e77fa0b7a8 Docs: Edit inline loading for breaking changes kingbri 2025-07-24 18:11:42 -04:00
  • ab04a6ed60 Dependencies: Bump ExllamaV3 kingbri 2025-07-18 22:56:35 -04:00
  • bf936f5c39 Dependencies: Update exllamav2 kingbri 2025-07-13 23:33:12 -04:00
  • 2419d2d0a3 Merge pull request #364 from theroyallab/tool-calls Brian 2025-07-11 11:34:10 -04:00
  • 707d005aad API: Default tool call ID and type tool-calls kingbri 2025-07-11 01:11:09 -04:00
  • 5b1db3ad83 API: Don't do a second re-render when tool calling kingbri 2025-07-06 11:32:36 -04:00
  • 3dfa965019 API: Add tool_call_id for role = tool kingbri 2025-07-05 21:52:26 -04:00
  • 1c3f84151f Docs: Update tool calling kingbri 2025-07-05 21:43:04 -04:00
  • 871f71c4e7 Templates: Adjust tool call example kingbri 2025-07-05 21:24:30 -04:00
  • 879f4cee7e API: Modify tool calling for wider compat kingbri 2025-07-05 14:28:12 -04:00
  • b6a26da50c API: Fix tool call serialization kingbri 2025-07-04 15:02:49 -04:00
  • d23fefbecd API + Model: Fix application of defaults kingbri 2025-07-03 14:37:34 -04:00
  • d339139fb6 Config: Deep merge model overrides kingbri 2025-07-03 12:17:09 -04:00
  • 0152a1665b Downloader: Switch to use API sizes kingbri 2025-06-30 12:49:53 -04:00
  • 03ff4c3128 Downloader: Handle if Content-Length is undefined kingbri 2025-06-30 11:43:22 -04:00
  • 0ae878712e Exl3: Clear image embedding cache on unload turboderp 2025-06-25 23:56:21 +02:00
  • e362319a4d Merge pull request #358 from theroyallab/breaking Brian 2025-06-17 23:10:16 -04:00
  • a02d39de31 Model: Remove rogue print breaking kingbri 2025-06-17 23:09:07 -04:00
  • 2913ce29fc API: Add timings to usage stats kingbri 2025-06-17 22:54:51 -04:00
  • 5d94d4d022 Merge branch 'main' into breaking kingbri 2025-06-17 22:24:32 -04:00
  • 122d87ac36 Tree: Format turboderp 2025-06-15 19:33:14 +02:00
  • 21c5af48e1 Tree: Format turboderp 2025-06-15 19:30:38 +02:00
  • 1c9891bf04 Exl3: Add vision capability turboderp 2025-06-15 19:22:51 +02:00
  • 4605c0f6bd Common: Refactor get_image to common functions turboderp 2025-06-15 19:20:36 +02:00
  • d357f100d0 Dependencies: Bump ExllamaV3 turboderp 2025-06-15 19:12:45 +02:00
  • a0c16bba2a Exl2: Fix banned_strings (move outside of assign_gen_params) turboderp 2025-06-15 16:51:42 +02:00
  • 2096c9bad2 Model: Default max_seq_len to 4096 kingbri 2025-06-13 14:12:03 -04:00
  • 322f9b773a Model: Migrate inline config to new format kingbri 2025-05-26 20:51:28 -04:00