Commit Graph

711 Commits

Author SHA1 Message Date
turboderp
27c68d4e65 Update README.md 2026-01-10 15:59:46 +01:00
turboderp
539410a2a3 Support NanoChatForCausalLM 2026-01-10 15:59:08 +01:00
turboderp
3ecb9f54fb Merge pull request #136 from mindkrypted/feature/support-solar-open-moe
Add support for SolarOpenMoE architecture
2026-01-10 10:55:36 +01:00
mindkrypted
fd8659a6c3 Add support for SolarOpenMoE architecture 2026-01-07 13:45:23 -05:00
turboderp
703b05ab52 Update README.md 2026-01-06 16:08:23 +01:00
turboderp
a17d1a4334 Add HCXVisionV2ForCausalLM architecture 2026-01-06 16:01:54 +01:00
turboderp
7de8641fce Attn: Add varlen mode 2026-01-06 16:01:54 +01:00
turboderp
a026b32df3 Support IQuestCoderForCausalLM 2026-01-04 12:31:58 +01:00
turboderp
6e75e7b151 chat.py: Fix for models with eos_token_id=null 2026-01-04 02:02:10 +01:00
turboderp
227621e49e Support HyperCLOVAXForCausalLM 2026-01-03 03:22:50 +01:00
turboderp
a92cf0a13a Attn: Support custom softmax scale in SDPA mode 2026-01-03 03:22:13 +01:00
turboderp
cff5fd542c Embedding: Support embedding multiplier 2026-01-03 03:21:55 +01:00
turboderp
452803e73d Olmo3: Use default RoPE type for SWA layers 2025-12-26 21:38:56 +01:00
turboderp
195d01657a RoPE: Allow RoPE type override 2025-12-26 21:38:34 +01:00
turboderp
e8b77bba4a chat.py: Fix prompt tokens/s display 2025-12-25 23:18:50 +01:00
turboderp
80907797a5 chat.py: Add debug mode 2025-12-25 23:18:25 +01:00
turboderp
f0ea2ca858 Linear: Support new FP8 scale format 2025-12-23 21:05:05 +01:00
turboderp
2698a83022 RoPE: Let arch override theta key name 2025-12-23 21:04:41 +01:00
amanwalksdownthestreet
65cfaf3c60 arch_list: Strip NVIDIA arch suffixes (sm_120a, sm_90a, etc.) 2025-12-16 23:06:34 -07:00
turboderp
a32e2219af Allow -hb 16 while quantizing 2025-12-13 20:55:25 +01:00
turboderp
104268521c Support Olmo3ForCausalLM 2025-12-13 20:49:03 +01:00
turboderp
bd0f26cd0e Fix comments 2025-12-10 21:47:30 +01:00
turboderp
1b7009c5b8 Merge remote-tracking branch 'origin/master' v0.0.18 2025-12-10 10:43:17 +01:00
turboderp
f9d0e6038f Bump to v0.0.18 2025-12-10 10:42:41 +01:00
turboderp
d8be5d638f chat.py: Read all stop conditions from config.json 2025-12-10 00:53:45 +01:00
turboderp
9b75bc5f58 Support Ministral3ForCausalLM 2025-12-10 00:53:22 +01:00
turboderp
9663357c4f Convert: Print some more RoPE debug info 2025-12-10 00:52:49 +01:00
turboderp
24caf2c762 RoPE: Accept partial_rotary_factor in rope_parameters 2025-12-10 00:52:29 +01:00
kingbri
e49c02a3aa Actions: Add builds for torch 2.9
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-12-09 15:03:25 -05:00
turboderp
4d4992a8b8 GLM4: Update config parser to support 4.6V 2025-12-08 21:39:42 +01:00
turboderp
1385486592 Bump to v0.0.17 v0.0.17 2025-12-07 17:47:20 +01:00
turboderp
784d3dc7e7 GEMM: Optimize reduction a little bit 2025-12-06 01:56:21 +01:00
turboderp
15b9c2b421 Cleanup 2025-12-06 01:55:56 +01:00
turboderp
700b34695f Generator: Fix #118, make sure prepare_logit_mask is only called on jobs in the sample batch.
Thanks to @EthanAndersonUSA
2025-12-05 16:29:13 +01:00
turboderp
dc654cf4d8 MoE Routing kernel: Allow num_experts not divisible by 32 2025-12-05 13:22:39 +01:00
turboderp
0a629cf70a HumanEval: Add max batch size arg 2025-12-05 13:21:07 +01:00
turboderp
bb43823e32 Mistral3: Try to load preprocessor config from processor_config.json if preprocessor_config.json not present 2025-12-03 19:10:30 +01:00
turboderp
8e4f4faee4 RoPE: Support Llama 4 attn scaling (for Ministral) 2025-12-03 18:27:18 +01:00
turboderp
7de8994a2a Linear: Allow FP8 layers with global scale 2025-12-03 18:25:42 +01:00
turboderp
1fa6071bc3 Config: Recognize rope_parameters and rope_theta therein 2025-12-03 18:25:03 +01:00
turboderp
beb23d4095 Loader: Don't break if transposing 0D tensor 2025-12-03 18:24:13 +01:00
turboderp
ba657d399d chat.py: Add Ministral template 2025-12-03 18:23:34 +01:00
turboderp
88d3814bc5 Merge pull request #113 from yadirhb/patch-1
KeyError: 'cache_id' originating from the function recv_embeddings
2025-11-29 00:05:47 +01:00
Yadir Hernandez Batista
965f70a871 Same treatment to method 2025-11-28 15:26:26 -05:00
Yadir Hernandez Batista
c8f30494f9 Removed None since it's the default. 2025-11-28 15:20:16 -05:00
Yadir Hernandez Batista
f1d838251e KeyError: 'cache_id' originating from the function recv_embeddings
Ran into issues today while testing the new Qwen3-VL-Instruct_5.0bpw_H6:
```
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.428 INFO: Received chat completion streaming request
Nov 28 19:03:58 tabby-api start.sh[34003]: e693c1eef51641df8d64dee63d490091
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.823 ERROR: FATAL ERROR with generation. Attempting to
Nov 28 19:03:58 tabby-api start.sh[34003]: recreate the generator. If this fails, please restart the server.
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.825 WARNING: Immediately terminating all jobs. Clients will
Nov 28 19:03:58 tabby-api start.sh[34003]: have their requests cancelled.
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: Traceback (most recent call last):
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/endpoints/OAI/utils/chat_completion.py", line 373, in
Nov 28 19:03:58 tabby-api start.sh[34003]: stream_generate_chat_completion
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: raise generation
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/endpoints/OAI/utils/completion.py", line 118, in
Nov 28 19:03:58 tabby-api start.sh[34003]: _stream_collector
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: async for generation in new_generation:
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/backends/exllamav3/model.py", line 779, in stream_generate
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: async for generation_chunk in
Nov 28 19:03:58 tabby-api start.sh[34003]: self.generate_gen(
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/backends/exllamav3/model.py", line 1060, in generate_gen
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: raise ex
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/backends/exllamav3/model.py", line 1002, in generate_gen
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: async for result in job:
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/async_gener
Nov 28 19:03:58 tabby-api start.sh[34003]: ator.py", line 87, in aiter
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: raise result
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/async_gener
Nov 28 19:03:58 tabby-api start.sh[34003]: ator.py", line 23, in _run_iteration
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: results = self.generator.iterate()
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: ^^^^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py",
Nov 28 19:03:58 tabby-api start.sh[34003]: line 120, in decorate_context
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: return func(*args, **kwargs)
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: ^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/generator.p
Nov 28 19:03:58 tabby-api start.sh[34003]: y", line 298, in iterate
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: job.prefill(results)
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/job.py",
Nov 28 19:03:58 tabby-api start.sh[34003]: line 1001, in prefill
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: self.generator.model.prefill(
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py",
Nov 28 19:03:58 tabby-api start.sh[34003]: line 120, in decorate_context
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: return func(*args, **kwargs)
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: ^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model.py", line
Nov 28 19:03:58 tabby-api start.sh[34003]: 101, in prefill
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: return self.prefill_tp(x, params,
Nov 28 19:03:58 tabby-api start.sh[34003]: self.last_kv_module_idx, self.modules)
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR:
Nov 28 19:03:58 tabby-api start.sh[34003]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model_tp.py",
Nov 28 19:03:58 tabby-api start.sh[34003]: line 379, in prefill_tp
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: self.tp_worker_dispatch(device,
Nov 28 19:03:58 tabby-api start.sh[34003]: mp_model_forward, (
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model_tp.py",
Nov 28 19:03:58 tabby-api start.sh[34003]: line 171, in tp_worker_dispatch
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: conn.send((fn, args))
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model_tp_fn.py"
Nov 28 19:03:58 tabby-api start.sh[34003]: , line 294, in send
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: self.result = fn(self.local_context,
Nov 28 19:03:58 tabby-api start.sh[34003]: *args)
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR:
Nov 28 19:03:58 tabby-api start.sh[34003]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model_tp_fn.py"
Nov 28 19:03:58 tabby-api start.sh[34003]: , line 202, in mp_model_forward
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: params["indexed_embeddings"] =
Nov 28 19:03:58 tabby-api start.sh[34003]: recv_embeddings(consumer, p)
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR:
Nov 28 19:03:58 tabby-api start.sh[34003]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/tokenizer/mm_embeddin
Nov 28 19:03:58 tabby-api start.sh[34003]: g.py", line 122, in recv_embeddings
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: consumer.recv(dse) for dse in
Nov 28 19:03:58 tabby-api start.sh[34003]: imp["deepstack_embeddings"]
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: ^^^^^^^^^^^^^^^^^^
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model_tp_shared
Nov 28 19:03:58 tabby-api start.sh[34003]: .py", line 189, in recv
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: cache_id = imp["cache_id"]
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: ~~~^^^^^^^^^^^^
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: KeyError: 'cache_id'
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.846 ERROR: Sent to request: Chat completion aborted.
Nov 28 19:03:58 tabby-api start.sh[34003]: Please check the server console.
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.890 INFO: 10.0.30.254:61377 - "GET /health HTTP/1.0" 503
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.893 INFO: Received chat completion request
Nov 28 19:03:58 tabby-api start.sh[34003]: ebf67afe70b74f8f8b34a28746951794
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.577 ERROR: FATAL ERROR with generation. Attempting to
Nov 28 19:04:00 tabby-api start.sh[34003]: recreate the generator. If this fails, please restart the server.
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.578 WARNING: Immediately terminating all jobs. Clients will
Nov 28 19:04:00 tabby-api start.sh[34003]: have their requests cancelled.
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.580 INFO: 10.0.30.254:11907 - "GET /health HTTP/1.0" 503
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: Traceback (most recent call last):
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/endpoints/OAI/utils/chat_completion.py", line 437, in
Nov 28 19:04:00 tabby-api start.sh[34003]: generate_chat_completion
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: generations = await
Nov 28 19:04:00 tabby-api start.sh[34003]: asyncio.gather(*gen_tasks)
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR:
Nov 28 19:04:00 tabby-api start.sh[34003]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/backends/exllamav3/model.py", line 692, in generate
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: async for generation in
Nov 28 19:04:00 tabby-api start.sh[34003]: self.stream_generate(
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/backends/exllamav3/model.py", line 779, in stream_generate
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: async for generation_chunk in
Nov 28 19:04:00 tabby-api start.sh[34003]: self.generate_gen(
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/backends/exllamav3/model.py", line 1060, in generate_gen
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: raise ex
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/backends/exllamav3/model.py", line 1002, in generate_gen
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: async for result in job:
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/async_gener
Nov 28 19:04:00 tabby-api start.sh[34003]: ator.py", line 87, in aiter
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: raise result
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/async_gener
Nov 28 19:04:00 tabby-api start.sh[34003]: ator.py", line 23, in _run_iteration
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: results = self.generator.iterate()
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: ^^^^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py",
Nov 28 19:04:00 tabby-api start.sh[34003]: line 120, in decorate_context
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: return func(*args, **kwargs)
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: ^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/generator.p
Nov 28 19:04:00 tabby-api start.sh[34003]: y", line 298, in iterate
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: job.prefill(results)
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/job.py",
Nov 28 19:04:00 tabby-api start.sh[34003]: line 1001, in prefill
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: self.generator.model.prefill(
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py",
Nov 28 19:04:00 tabby-api start.sh[34003]: line 120, in decorate_context
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: return func(*args, **kwargs)
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: ^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model.py", line
Nov 28 19:04:00 tabby-api start.sh[34003]: 101, in prefill
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: return self.prefill_tp(x, params,
Nov 28 19:04:00 tabby-api start.sh[34003]: self.last_kv_module_idx, self.modules)
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR:
Nov 28 19:04:00 tabby-api start.sh[34003]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model_tp.py",
Nov 28 19:04:00 tabby-api start.sh[34003]: line 386, in prefill_tp
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: r = self.tp_worker_result(device)
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model_tp.py",
Nov 28 19:04:00 tabby-api start.sh[34003]: line 181, in tp_worker_result
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: raise result
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: KeyError: 'cache_id'
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.590 ERROR: Sent to request: Chat completion
Nov 28 19:04:00 tabby-api start.sh[34003]: ebf67afe70b74f8f8b34a28746951794 aborted. Maybe the model was unloaded? Please
Nov 28 19:04:00 tabby-api start.sh[34003]: check the server console.
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.591 INFO: 10.0.30.254:50633 - "POST /v1/chat/completions
Nov 28 19:04:00 tabby-api start.sh[34003]: HTTP/1.1" 503
Nov 28 19:04:01 tabby-api start.sh[34003]: 2025-11-28 19:04:01.592 INFO: 10.0.30.254:35928 - "GET /health HTTP/1.0" 503
```

After reviewing with Copilot (I don't know this code base), found a small, yet valid change that got tabbyAPI back working without breaking the server again.
2025-11-28 15:14:58 -05:00
turboderp
9e314e6c76 Bump to v0.0.16 v0.0.16 2025-11-25 17:54:31 +01:00
turboderp
85ae1e45b5 Add boilerplate for sharing MM/deepstack embeddings across TP model 2025-11-24 22:23:35 +01:00
turboderp
c9654130a5 Fix TP regression 2025-11-24 00:21:09 +01:00
turboderp
232cc1d8ea ParallelDecoderBlock: Fix regression 2025-11-16 14:27:56 +01:00