Commit Graph

722 Commits

Author SHA1 Message Date
turboderp
1fa6071bc3 Config: Recognize rope_parameters and rope_theta therein 2025-12-03 18:25:03 +01:00
turboderp
beb23d4095 Loader: Don't break if transposing 0D tensor 2025-12-03 18:24:13 +01:00
turboderp
ba657d399d chat.py: Add Ministral template 2025-12-03 18:23:34 +01:00
turboderp
88d3814bc5 Merge pull request #113 from yadirhb/patch-1
KeyError: 'cache_id' originating from the function recv_embeddings
2025-11-29 00:05:47 +01:00
Yadir Hernandez Batista
965f70a871 Same treatment to method 2025-11-28 15:26:26 -05:00
Yadir Hernandez Batista
c8f30494f9 Removed None since it's the default. 2025-11-28 15:20:16 -05:00
Yadir Hernandez Batista
f1d838251e KeyError: 'cache_id' originating from the function recv_embeddings
Ran into issues today while testing the new Qwen3-VL-Instruct_5.0bpw_H6:
```
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.428 INFO: Received chat completion streaming request
Nov 28 19:03:58 tabby-api start.sh[34003]: e693c1eef51641df8d64dee63d490091
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.823 ERROR: FATAL ERROR with generation. Attempting to
Nov 28 19:03:58 tabby-api start.sh[34003]: recreate the generator. If this fails, please restart the server.
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.825 WARNING: Immediately terminating all jobs. Clients will
Nov 28 19:03:58 tabby-api start.sh[34003]: have their requests cancelled.
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: Traceback (most recent call last):
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/endpoints/OAI/utils/chat_completion.py", line 373, in
Nov 28 19:03:58 tabby-api start.sh[34003]: stream_generate_chat_completion
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: raise generation
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/endpoints/OAI/utils/completion.py", line 118, in
Nov 28 19:03:58 tabby-api start.sh[34003]: _stream_collector
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: async for generation in new_generation:
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/backends/exllamav3/model.py", line 779, in stream_generate
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: async for generation_chunk in
Nov 28 19:03:58 tabby-api start.sh[34003]: self.generate_gen(
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/backends/exllamav3/model.py", line 1060, in generate_gen
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: raise ex
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/backends/exllamav3/model.py", line 1002, in generate_gen
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: async for result in job:
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/async_gener
Nov 28 19:03:58 tabby-api start.sh[34003]: ator.py", line 87, in aiter
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: raise result
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/async_gener
Nov 28 19:03:58 tabby-api start.sh[34003]: ator.py", line 23, in _run_iteration
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: results = self.generator.iterate()
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: ^^^^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py",
Nov 28 19:03:58 tabby-api start.sh[34003]: line 120, in decorate_context
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: return func(*args, **kwargs)
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: ^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/generator.p
Nov 28 19:03:58 tabby-api start.sh[34003]: y", line 298, in iterate
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: job.prefill(results)
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/job.py",
Nov 28 19:03:58 tabby-api start.sh[34003]: line 1001, in prefill
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: self.generator.model.prefill(
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py",
Nov 28 19:03:58 tabby-api start.sh[34003]: line 120, in decorate_context
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: return func(*args, **kwargs)
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: ^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model.py", line
Nov 28 19:03:58 tabby-api start.sh[34003]: 101, in prefill
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: return self.prefill_tp(x, params,
Nov 28 19:03:58 tabby-api start.sh[34003]: self.last_kv_module_idx, self.modules)
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR:
Nov 28 19:03:58 tabby-api start.sh[34003]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model_tp.py",
Nov 28 19:03:58 tabby-api start.sh[34003]: line 379, in prefill_tp
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: self.tp_worker_dispatch(device,
Nov 28 19:03:58 tabby-api start.sh[34003]: mp_model_forward, (
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model_tp.py",
Nov 28 19:03:58 tabby-api start.sh[34003]: line 171, in tp_worker_dispatch
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: conn.send((fn, args))
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model_tp_fn.py"
Nov 28 19:03:58 tabby-api start.sh[34003]: , line 294, in send
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: self.result = fn(self.local_context,
Nov 28 19:03:58 tabby-api start.sh[34003]: *args)
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR:
Nov 28 19:03:58 tabby-api start.sh[34003]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model_tp_fn.py"
Nov 28 19:03:58 tabby-api start.sh[34003]: , line 202, in mp_model_forward
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: params["indexed_embeddings"] =
Nov 28 19:03:58 tabby-api start.sh[34003]: recv_embeddings(consumer, p)
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR:
Nov 28 19:03:58 tabby-api start.sh[34003]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/tokenizer/mm_embeddin
Nov 28 19:03:58 tabby-api start.sh[34003]: g.py", line 122, in recv_embeddings
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: consumer.recv(dse) for dse in
Nov 28 19:03:58 tabby-api start.sh[34003]: imp["deepstack_embeddings"]
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: ^^^^^^^^^^^^^^^^^^
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: File
Nov 28 19:03:58 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model_tp_shared
Nov 28 19:03:58 tabby-api start.sh[34003]: .py", line 189, in recv
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: cache_id = imp["cache_id"]
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: ~~~^^^^^^^^^^^^
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.830 ERROR: KeyError: 'cache_id'
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.846 ERROR: Sent to request: Chat completion aborted.
Nov 28 19:03:58 tabby-api start.sh[34003]: Please check the server console.
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.890 INFO: 10.0.30.254:61377 - "GET /health HTTP/1.0" 503
Nov 28 19:03:58 tabby-api start.sh[34003]: 2025-11-28 19:03:58.893 INFO: Received chat completion request
Nov 28 19:03:58 tabby-api start.sh[34003]: ebf67afe70b74f8f8b34a28746951794
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.577 ERROR: FATAL ERROR with generation. Attempting to
Nov 28 19:04:00 tabby-api start.sh[34003]: recreate the generator. If this fails, please restart the server.
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.578 WARNING: Immediately terminating all jobs. Clients will
Nov 28 19:04:00 tabby-api start.sh[34003]: have their requests cancelled.
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.580 INFO: 10.0.30.254:11907 - "GET /health HTTP/1.0" 503
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: Traceback (most recent call last):
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/endpoints/OAI/utils/chat_completion.py", line 437, in
Nov 28 19:04:00 tabby-api start.sh[34003]: generate_chat_completion
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: generations = await
Nov 28 19:04:00 tabby-api start.sh[34003]: asyncio.gather(*gen_tasks)
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR:
Nov 28 19:04:00 tabby-api start.sh[34003]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/backends/exllamav3/model.py", line 692, in generate
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: async for generation in
Nov 28 19:04:00 tabby-api start.sh[34003]: self.stream_generate(
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/backends/exllamav3/model.py", line 779, in stream_generate
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: async for generation_chunk in
Nov 28 19:04:00 tabby-api start.sh[34003]: self.generate_gen(
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/backends/exllamav3/model.py", line 1060, in generate_gen
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: raise ex
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/backends/exllamav3/model.py", line 1002, in generate_gen
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: async for result in job:
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/async_gener
Nov 28 19:04:00 tabby-api start.sh[34003]: ator.py", line 87, in aiter
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: raise result
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/async_gener
Nov 28 19:04:00 tabby-api start.sh[34003]: ator.py", line 23, in _run_iteration
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: results = self.generator.iterate()
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: ^^^^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py",
Nov 28 19:04:00 tabby-api start.sh[34003]: line 120, in decorate_context
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: return func(*args, **kwargs)
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: ^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/generator.p
Nov 28 19:04:00 tabby-api start.sh[34003]: y", line 298, in iterate
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: job.prefill(results)
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/job.py",
Nov 28 19:04:00 tabby-api start.sh[34003]: line 1001, in prefill
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: self.generator.model.prefill(
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py",
Nov 28 19:04:00 tabby-api start.sh[34003]: line 120, in decorate_context
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: return func(*args, **kwargs)
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: ^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model.py", line
Nov 28 19:04:00 tabby-api start.sh[34003]: 101, in prefill
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: return self.prefill_tp(x, params,
Nov 28 19:04:00 tabby-api start.sh[34003]: self.last_kv_module_idx, self.modules)
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR:
Nov 28 19:04:00 tabby-api start.sh[34003]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model_tp.py",
Nov 28 19:04:00 tabby-api start.sh[34003]: line 386, in prefill_tp
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: r = self.tp_worker_result(device)
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: File
Nov 28 19:04:00 tabby-api start.sh[34003]: "/opt/tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/model/model_tp.py",
Nov 28 19:04:00 tabby-api start.sh[34003]: line 181, in tp_worker_result
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: raise result
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.582 ERROR: KeyError: 'cache_id'
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.590 ERROR: Sent to request: Chat completion
Nov 28 19:04:00 tabby-api start.sh[34003]: ebf67afe70b74f8f8b34a28746951794 aborted. Maybe the model was unloaded? Please
Nov 28 19:04:00 tabby-api start.sh[34003]: check the server console.
Nov 28 19:04:00 tabby-api start.sh[34003]: 2025-11-28 19:04:00.591 INFO: 10.0.30.254:50633 - "POST /v1/chat/completions
Nov 28 19:04:00 tabby-api start.sh[34003]: HTTP/1.1" 503
Nov 28 19:04:01 tabby-api start.sh[34003]: 2025-11-28 19:04:01.592 INFO: 10.0.30.254:35928 - "GET /health HTTP/1.0" 503
```

After reviewing with Copilot (I don't know this code base), found a small, yet valid change that got tabbyAPI back working without breaking the server again.
2025-11-28 15:14:58 -05:00
turboderp
9e314e6c76 Bump to v0.0.16 v0.0.16 2025-11-25 17:54:31 +01:00
turboderp
85ae1e45b5 Add boilerplate for sharing MM/deepstack embeddings across TP model 2025-11-24 22:23:35 +01:00
turboderp
c9654130a5 Fix TP regression 2025-11-24 00:21:09 +01:00
turboderp
232cc1d8ea ParallelDecoderBlock: Fix regression 2025-11-16 14:27:56 +01:00
turboderp
ef8fd43d1c Cleanup unused imports 2025-11-16 14:25:46 +01:00
turboderp
2fc131efc1 Bump to v0.0.15 v0.0.15 2025-11-16 13:51:37 +01:00
turboderp
9b58b45999 Update README.md 2025-11-13 17:23:41 +01:00
turboderp
c47c17bae2 Add Glm4V-MoE architecture 2025-11-13 16:56:29 +01:00
turboderp
78c80485e6 Qwen3VL: Fix type hints 2025-11-13 16:55:11 +01:00
turboderp
1084881f98 Config: Consider token IDs in text_config section 2025-11-13 16:54:32 +01:00
Mamy Ratsimbazafy
ad1aab406f Add size estimation script for model tensors size 2025-11-13 13:49:39 +01:00
turboderp
08a82f36a3 Add Glm4V architecture 2025-11-13 13:00:19 +01:00
turboderp
7c6b6c473f chat.py: Allow breaking stream with esc 2025-11-13 12:55:32 +01:00
turboderp
0bd0bfa17d chat.py: Token dump feature 2025-11-13 12:54:34 +01:00
turboderp
c3886f0841 chat.py: Don't crash on wrong stop conditions 2025-11-13 12:53:23 +01:00
turboderp
1dff8123a6 chat.py: Better SVG extract 2025-11-13 12:52:46 +01:00
turboderp
792bd9dc75 Glm4V: Update examples 2025-11-13 12:52:46 +01:00
turboderp
e3d83c5670 (WIP) Vision tower TP split 2025-11-13 12:48:25 +01:00
turboderp
b937401715 Tokenizer: Fix encoding of unspecial added tokens 2025-11-13 12:46:53 +01:00
turboderp
605428a9cf RoPE kernel: Allow inv_freq (and M-RoPE) table to work with partial_rotary_factor 2025-11-13 12:46:02 +01:00
turboderp
2a4aac07d8 Bump to v0.0.14 v0.0.14 2025-11-10 01:35:08 +01:00
turboderp
fcce0e6985 Fix example Mistral template 2025-11-10 01:34:26 +01:00
turboderp
abd8ddccbf Conv: Fix regression for 2D input 2025-11-10 01:32:49 +01:00
turboderp
98e1c4017c Update README.md v0.0.13 2025-11-09 23:02:08 +01:00
turboderp
46c616a5f9 Bump to v0.0.13 2025-11-09 22:59:29 +01:00
turboderp
38ddd8b9c5 MMLU: Fix prompt 2025-11-09 22:25:53 +01:00
turboderp
1a298a8161 Generator: Fix job enqueue when max_new_tokens == 1 2025-11-09 21:41:42 +01:00
turboderp
03f3d9b861 Unpack SimpleNamespace dict for older Python versions 2025-11-09 15:46:58 +01:00
turboderp
8190932910 Update multimodal example 2025-11-09 13:34:11 +01:00
turboderp
3562dbe7b0 compare_q.py: Work around AutoAWQ being broken in later versions of Transformers 2025-11-09 13:33:40 +01:00
turboderp
ff89563719 Qwen3VL: Fix typo 2025-11-09 03:09:32 +01:00
turboderp
1739b92d50 DeepstackEmbed: Fix typo 2025-11-09 02:53:06 +01:00
turboderp
5d64b322f9 Add Qwen3VLMoe 2025-11-09 02:50:08 +01:00
turboderp
6bc327afda Allow loading fused expert tensors (new Qwen3 MoE format) 2025-11-09 02:49:18 +01:00
turboderp
20cbe13cab Update multimodal example 2025-11-09 01:42:30 +01:00
turboderp
da3ea0013c Add Qwen3VL 2025-11-09 01:41:43 +01:00
turboderp
663f3694d8 Linear: Account for bias when splitting fused tensor 2025-11-09 01:41:43 +01:00
turboderp
261621ce0a GatedDeltaNet: Add module name 2025-11-09 01:41:43 +01:00
turboderp
7d9ad806e6 RoPE: Add M-RoPE facilities 2025-11-09 01:41:43 +01:00
turboderp
fab5da05ce chat.py: Better SVG extraction 2025-11-02 23:54:22 +01:00
turboderp
3533b307c3 Update README.md 2025-11-01 19:42:18 +01:00
turboderp
0c0139d9cf Update docs 2025-11-01 19:40:58 +01:00
turboderp
e384d39e98 Bump to v0.0.12 v0.0.12 2025-11-01 18:21:50 +01:00