exllamav3

mirror of https://github.com/turboderp-org/exllamav3.git synced 2026-04-29 02:31:34 +00:00

Author	SHA1	Message	Date
turboderp	27c68d4e65	Update README.md	2026-01-10 15:59:46 +01:00
turboderp	539410a2a3	Support NanoChatForCausalLM	2026-01-10 15:59:08 +01:00
turboderp	3ecb9f54fb	Merge pull request #136 from mindkrypted/feature/support-solar-open-moe Add support for SolarOpenMoE architecture	2026-01-10 10:55:36 +01:00
mindkrypted	fd8659a6c3	Add support for SolarOpenMoE architecture	2026-01-07 13:45:23 -05:00
turboderp	703b05ab52	Update README.md	2026-01-06 16:08:23 +01:00
turboderp	a17d1a4334	Add HCXVisionV2ForCausalLM architecture	2026-01-06 16:01:54 +01:00
turboderp	7de8641fce	Attn: Add varlen mode	2026-01-06 16:01:54 +01:00
turboderp	a026b32df3	Support IQuestCoderForCausalLM	2026-01-04 12:31:58 +01:00
turboderp	6e75e7b151	chat.py: Fix for models with eos_token_id=null	2026-01-04 02:02:10 +01:00
turboderp	227621e49e	Support HyperCLOVAXForCausalLM	2026-01-03 03:22:50 +01:00
turboderp	a92cf0a13a	Attn: Support custom softmax scale in SDPA mode	2026-01-03 03:22:13 +01:00
turboderp	cff5fd542c	Embedding: Support embedding multiplier	2026-01-03 03:21:55 +01:00
turboderp	452803e73d	Olmo3: Use default RoPE type for SWA layers	2025-12-26 21:38:56 +01:00
turboderp	195d01657a	RoPE: Allow RoPE type override	2025-12-26 21:38:34 +01:00
turboderp	e8b77bba4a	chat.py: Fix prompt tokens/s display	2025-12-25 23:18:50 +01:00
turboderp	80907797a5	chat.py: Add debug mode	2025-12-25 23:18:25 +01:00
turboderp	f0ea2ca858	Linear: Support new FP8 scale format	2025-12-23 21:05:05 +01:00
turboderp	2698a83022	RoPE: Let arch override theta key name	2025-12-23 21:04:41 +01:00
amanwalksdownthestreet	65cfaf3c60	arch_list: Strip NVIDIA arch suffixes (sm_120a, sm_90a, etc.)	2025-12-16 23:06:34 -07:00
turboderp	a32e2219af	Allow -hb 16 while quantizing	2025-12-13 20:55:25 +01:00
turboderp	104268521c	Support Olmo3ForCausalLM	2025-12-13 20:49:03 +01:00
turboderp	bd0f26cd0e	Fix comments	2025-12-10 21:47:30 +01:00
turboderp	1b7009c5b8	Merge remote-tracking branch 'origin/master' v0.0.18	2025-12-10 10:43:17 +01:00
turboderp	f9d0e6038f	Bump to v0.0.18	2025-12-10 10:42:41 +01:00
turboderp	d8be5d638f	chat.py: Read all stop conditions from config.json	2025-12-10 00:53:45 +01:00
turboderp	9b75bc5f58	Support Ministral3ForCausalLM	2025-12-10 00:53:22 +01:00
turboderp	9663357c4f	Convert: Print some more RoPE debug info	2025-12-10 00:52:49 +01:00
turboderp	24caf2c762	RoPE: Accept partial_rotary_factor in rope_parameters	2025-12-10 00:52:29 +01:00
kingbri	e49c02a3aa	Actions: Add builds for torch 2.9 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-12-09 15:03:25 -05:00
turboderp	4d4992a8b8	GLM4: Update config parser to support 4.6V	2025-12-08 21:39:42 +01:00
turboderp	1385486592	Bump to v0.0.17 v0.0.17	2025-12-07 17:47:20 +01:00
turboderp	784d3dc7e7	GEMM: Optimize reduction a little bit	2025-12-06 01:56:21 +01:00
turboderp	15b9c2b421	Cleanup	2025-12-06 01:55:56 +01:00
turboderp	700b34695f	Generator: Fix #118 , make sure prepare_logit_mask is only called on jobs in the sample batch. Thanks to @EthanAndersonUSA	2025-12-05 16:29:13 +01:00
turboderp	dc654cf4d8	MoE Routing kernel: Allow num_experts not divisible by 32	2025-12-05 13:22:39 +01:00
turboderp	0a629cf70a	HumanEval: Add max batch size arg	2025-12-05 13:21:07 +01:00
turboderp	bb43823e32	Mistral3: Try to load preprocessor config from processor_config.json if preprocessor_config.json not present	2025-12-03 19:10:30 +01:00
turboderp	8e4f4faee4	RoPE: Support Llama 4 attn scaling (for Ministral)	2025-12-03 18:27:18 +01:00
turboderp	7de8994a2a	Linear: Allow FP8 layers with global scale	2025-12-03 18:25:42 +01:00
turboderp	1fa6071bc3	Config: Recognize `rope_parameters` and `rope_theta` therein	2025-12-03 18:25:03 +01:00
turboderp	beb23d4095	Loader: Don't break if transposing 0D tensor	2025-12-03 18:24:13 +01:00
turboderp	ba657d399d	chat.py: Add Ministral template	2025-12-03 18:23:34 +01:00
turboderp	88d3814bc5	Merge pull request #113 from yadirhb/patch-1 KeyError: 'cache_id' originating from the function recv_embeddings	2025-11-29 00:05:47 +01:00
Yadir Hernandez Batista	965f70a871	Same treatment to method	2025-11-28 15:26:26 -05:00
Yadir Hernandez Batista	c8f30494f9	Removed None since it's the default.	2025-11-28 15:20:16 -05:00
Yadir Hernandez Batista	f1d838251e	KeyError: 'cache_id' originating from the function recv_embeddings Ran into issues today while testing the new Qwen3-VL-Instruct_5.0bpw_H6: ``` Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.428 INFO: Received chat completion streaming request Nov 28 19:03:58 tabby-api start.sh[34003]: e693c1eef51641df8d64dee63d490091 Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.823 ERROR: FATAL ERROR with generation. Attempting to Nov 28 19:03:58 tabby-api start.sh[34003]: recreate the generator. If this fails, please restart the server. Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.825 WARNING: Immediately terminating all jobs. Clients will Nov 28 19:03:58 tabby-api start.sh[34003]: have their requests cancelled. Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: Traceback (most recent call last): Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/endpoints/OAI/utils/chat_completion.py", line 373, in Nov 28 19:03:58 tabby-api start.sh[34003]: stream_generate_chat_completion Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: raise generation Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/endpoints/OAI/utils/completion.py", line 118, in Nov 28 19:03:58 tabby-api start.sh[34003]: _stream_collector Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: async for generation in new_generation: Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/backends/exllamav3/model.py", line 779, in stream_generate Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: async for generation_chunk in Nov 28 19:03:58 tabby-api start.sh[34003]: self.generate_gen( Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/backends/exllamav3/model.py", line 1060, in generate_gen Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: raise ex Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/backends/exllamav3/model.py", line 1002, in generate_gen Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: async for result in job: Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/async_gener Nov 28 19:03:58 tabby-api start.sh[34003]: ator.py", line 87, in aiter Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: raise result Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/async_gener Nov 28 19:03:58 tabby-api start.sh[34003]: ator.py", line 23, in _run_iteration Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: results = self.generator.iterate() Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: ^^^^^^^^^^^^^^^^^^^^^^^^ Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", Nov 28 19:03:58 tabby-api start.sh[34003]: line 120, in decorate_context Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: return func(args, kwargs) Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: ^^^^^^^^^^^^^^^^^^^^^ Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/generator.p Nov 28 19:03:58 tabby-api start.sh[34003]: y", line 298, in iterate Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: job.prefill(results) Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/job.py", Nov 28 19:03:58 tabby-api start.sh[34003]: line 1001, in prefill Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: self.generator.model.prefill( Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", Nov 28 19:03:58 tabby-api start.sh[34003]: line 120, in decorate_context Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: return func(args, *kwargs) Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: ^^^^^^^^^^^^^^^^^^^^^ Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model.py", line Nov 28 19:03:58 tabby-api start.sh[34003]: 101, in prefill Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: return self.prefill_tp(x, params, Nov 28 19:03:58 tabby-api start.sh[34003]: self.last_kv_module_idx, self.modules) Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: Nov 28 19:03:58 tabby-api start.sh[34003]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model_tp.py", Nov 28 19:03:58 tabby-api start.sh[34003]: line 379, in prefill_tp Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: self.tp_worker_dispatch(device, Nov 28 19:03:58 tabby-api start.sh[34003]: mp_model_forward, ( Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model_tp.py", Nov 28 19:03:58 tabby-api start.sh[34003]: line 171, in tp_worker_dispatch Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: conn.send((fn, args)) Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model_tp_fn.py" Nov 28 19:03:58 tabby-api start.sh[34003]: , line 294, in send Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: self.result = fn(self.local_context, Nov 28 19:03:58 tabby-api start.sh[34003]: args) Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: Nov 28 19:03:58 tabby-api start.sh[34003]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model_tp_fn.py" Nov 28 19:03:58 tabby-api start.sh[34003]: , line 202, in mp_model_forward Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: params["indexed_embeddings"] = Nov 28 19:03:58 tabby-api start.sh[34003]: recv_embeddings(consumer, p) Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: Nov 28 19:03:58 tabby-api start.sh[34003]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/tokenizer/mm_embeddin Nov 28 19:03:58 tabby-api start.sh[34003]: g.py", line 122, in recv_embeddings Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: consumer.recv(dse) for dse in Nov 28 19:03:58 tabby-api start.sh[34003]: imp["deepstack_embeddings"] Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: ^^^^^^^^^^^^^^^^^^ Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model_tp_shared Nov 28 19:03:58 tabby-api start.sh[34003]: .py", line 189, in recv Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: cache_id = imp["cache_id"] Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: ~~~^^^^^^^^^^^^ Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: KeyError: 'cache_id' Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.846 ERROR: Sent to request: Chat completion aborted. Nov 28 19:03:58 tabby-api start.sh[34003]: Please check the server console. Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.890 INFO: 10.0.30.254:61377 - "GET /health HTTP/1.0" 503 Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.893 INFO: Received chat completion request Nov 28 19:03:58 tabby-api start.sh[34003]: ebf67afe70b74f8f8b34a28746951794 Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.577 ERROR: FATAL ERROR with generation. Attempting to Nov 28 19:04:00 tabby-api start.sh[34003]: recreate the generator. If this fails, please restart the server. Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.578 WARNING: Immediately terminating all jobs. Clients will Nov 28 19:04:00 tabby-api start.sh[34003]: have their requests cancelled. Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.580 INFO: 10.0.30.254:11907 - "GET /health HTTP/1.0" 503 Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: Traceback (most recent call last): Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/endpoints/OAI/utils/chat_completion.py", line 437, in Nov 28 19:04:00 tabby-api start.sh[34003]: generate_chat_completion Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: generations = await Nov 28 19:04:00 tabby-api start.sh[34003]: asyncio.gather(gen_tasks) Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: Nov 28 19:04:00 tabby-api start.sh[34003]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/backends/exllamav3/model.py", line 692, in generate Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: async for generation in Nov 28 19:04:00 tabby-api start.sh[34003]: self.stream_generate( Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/backends/exllamav3/model.py", line 779, in stream_generate Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: async for generation_chunk in Nov 28 19:04:00 tabby-api start.sh[34003]: self.generate_gen( Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/backends/exllamav3/model.py", line 1060, in generate_gen Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: raise ex Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/backends/exllamav3/model.py", line 1002, in generate_gen Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: async for result in job: Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/async_gener Nov 28 19:04:00 tabby-api start.sh[34003]: ator.py", line 87, in aiter Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: raise result Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/async_gener Nov 28 19:04:00 tabby-api start.sh[34003]: ator.py", line 23, in _run_iteration Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: results = self.generator.iterate() Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: ^^^^^^^^^^^^^^^^^^^^^^^^ Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", Nov 28 19:04:00 tabby-api start.sh[34003]: line 120, in decorate_context Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: return func(args, *kwargs) Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: ^^^^^^^^^^^^^^^^^^^^^ Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/generator.p Nov 28 19:04:00 tabby-api start.sh[34003]: y", line 298, in iterate Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: job.prefill(results) Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/job.py", Nov 28 19:04:00 tabby-api start.sh[34003]: line 1001, in prefill Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: self.generator.model.prefill( Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", Nov 28 19:04:00 tabby-api start.sh[34003]: line 120, in decorate_context Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: return func(args, **kwargs) Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: ^^^^^^^^^^^^^^^^^^^^^ Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model.py", line Nov 28 19:04:00 tabby-api start.sh[34003]: 101, in prefill Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: return self.prefill_tp(x, params, Nov 28 19:04:00 tabby-api start.sh[34003]: self.last_kv_module_idx, self.modules) Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: Nov 28 19:04:00 tabby-api start.sh[34003]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model_tp.py", Nov 28 19:04:00 tabby-api start.sh[34003]: line 386, in prefill_tp Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: r = self.tp_worker_result(device) Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model_tp.py", Nov 28 19:04:00 tabby-api start.sh[34003]: line 181, in tp_worker_result Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: raise result Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: KeyError: 'cache_id' Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.590 ERROR: Sent to request: Chat completion Nov 28 19:04:00 tabby-api start.sh[34003]: ebf67afe70b74f8f8b34a28746951794 aborted. Maybe the model was unloaded? Please Nov 28 19:04:00 tabby-api start.sh[34003]: check the server console. Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.591 INFO: 10.0.30.254:50633 - "POST /v1/chat/completions Nov 28 19:04:00 tabby-api start.sh[34003]: HTTP/1.1" 503 Nov 28 19:04:01 tabby-api start.sh[34003]: 2025-11-28 19:04:01.592 INFO: 10.0.30.254:35928 - "GET /health HTTP/1.0" 503 ``` After reviewing with Copilot (I don't know this code base), found a small, yet valid change that got tabbyAPI back working without breaking the server again.	2025-11-28 15:14:58 -05:00
turboderp	9e314e6c76	Bump to v0.0.16 v0.0.16	2025-11-25 17:54:31 +01:00
turboderp	85ae1e45b5	Add boilerplate for sharing MM/deepstack embeddings across TP model	2025-11-24 22:23:35 +01:00
turboderp	c9654130a5	Fix TP regression	2025-11-24 00:21:09 +01:00
turboderp	232cc1d8ea	ParallelDecoderBlock: Fix regression	2025-11-16 14:27:56 +01:00

1 2 3 4 5 ...

711 Commits