kingbri
bd16681825
Start: Mark cuda 11.8 as unsupported
...
Temporary until existing cuda 11.8 scripts can be migrated to cuda 12.
Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com >
2025-01-12 21:50:41 -05:00
Brian
566e5b5937
Merge pull request #271 from lifo9/bump-formatron
...
Bump formatron to `0.4.11`
2025-01-07 23:19:35 -05:00
Jakub Filo
f8d9cfb5fd
Bump formatron to 0.4.11
2025-01-08 00:48:25 +01:00
kingbri
cfb439c0e6
Dependencies: Update exllamav2 and pytorch for ROCm
...
Exllama v0.2.7, pytorch v2.5.1 across all cards.
AMD now requires ROCm 6.2
Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com >
2025-01-01 16:22:10 -05:00
kingbri
6da65a8fd3
Embeddings: Fix base64 return
...
A base64 embedding can be a string post-encoding.
Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com >
2025-01-01 16:15:12 -05:00
kingbri
245bd5c008
Templates: Alter chatml_with_headers to fit huggingface spec
...
The previous template was compatible with Jinja2 in Python, but it
was not cross-platform compatible according to HF's standards.
Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com >
2024-12-30 14:00:44 -05:00
Brian
709493837b
Merge pull request #264 from DocShotgun/robust-length-checking
...
Robust request length checking in generator
2024-12-26 23:37:53 -05:00
kingbri
b994aae995
Model: Cleanup generation length and page checks
...
Reduce the amount of if statements and combine parts of code.
Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com >
2024-12-26 23:13:08 -05:00
kingbri
ba2579ff74
Merge branch 'main' into robust-length-checks
2024-12-26 18:00:26 -05:00
kingbri
7878d351a7
Endpoints: Add props endpoint and add more values to model params
...
The props endpoint is a standard used by llamacpp APIs which returns
various properties of a model to a server. It's still recommended to
use /v1/model to get all the parameters a TabbyAPI model has.
Also include the contents of a prompt template when fetching the current
model.
Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com >
2024-12-26 17:32:19 -05:00
kingbri
fa8035ef72
Dependencies: Update sse-starlette and formatron
...
Also pin newer versions of dependencies and fix an import from sse-starlette
Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com >
2024-12-21 23:14:55 -05:00
kingbri
b579fd46b7
Dependencies: Remove outlines from optional check
...
Outlines is no longer a dependency that's used in TabbyAPI.
Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com >
2024-12-18 11:56:40 -05:00
DocShotgun
4d11323c17
Tree: Format
2024-12-17 09:37:33 -08:00
DocShotgun
5da335eb3d
Model: Robust request length checking in generator
...
* Ensure that length of positive/negative prompt + max_tokens does not exceed max_seq_len
* Ensure that total required pages for CFG request does not exceed allocated cache_size
2024-12-17 09:34:43 -08:00
kingbri
c23e406f2d
Sampling: Add max_completion_tokens
...
Conforms with OAI's updated spec
Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com >
2024-12-13 01:02:37 -05:00
kingbri
bc3c154c96
Dependencies: Pin tokenizers
...
Use a version greater than 0.20.0 for newer model support.
Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com >
2024-12-13 00:58:25 -05:00
Brian
1ba33bf646
Merge pull request #252 from DocShotgun/main
...
Switch grammar backend to Formatron
2024-12-13 00:55:20 -05:00
kingbri
f25ac4b833
Dependencies: Update ExllamaV2
...
v0.2.6
Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com >
2024-12-13 00:47:29 -05:00
kingbri
8df8ba3ddb
Dependencies: Update ExllamaV2
...
v0.2.6
Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com >
2024-12-11 21:58:25 -05:00
DocShotgun
7f899734c0
Grammar: Cache the engine vocabulary
...
* Avoid rebuilding the KBNF engine vocabulary on every grammar-enabled request
2024-12-05 21:36:37 -08:00
kingbri
8ccd7a12a2
Merge branch 'main' into formatron
2024-12-05 23:01:22 -05:00
kingbri
ac85e34356
Depenedencies: Update Torch, FA2, and Exl2
...
Torch: 2.5, FA2 2.7.0.post2, Exl2 v0.2.5
Don't update torch for rocm as exl2 isn't built for rocm 6.2
Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com >
2024-12-03 22:57:00 -05:00
kingbri
ca86ab5477
Dependencies: Remove CUDA 11.8
...
Most software has moved to CUDA 12 and cards that aren't supported by
11.8 don't use tabby anyways.
Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com >
2024-12-03 22:37:03 -05:00
kingbri
3c4211c963
Dependencies: Ensure updated kbnf
...
Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com >
2024-12-02 15:10:20 -05:00
Brian
fe44e4a524
Merge pull request #253 from randoentity/workaround-toolcall
...
workaround for tool calling
2024-11-28 23:30:00 -05:00
kingbri
2e06fb01d3
OAI: Pass mm_embeddings to tool call generation
...
Don't exclude the vision embeddings when regenerating for a tool call.
Signed-off-by: kingbri <bdashore3@proton.me >
2024-11-28 23:27:59 -05:00
Brian
b81dcdaf66
Merge pull request #232 from AlpinDale/serviceinfo_uri
...
feat: add serviceinfo URI
2024-11-28 23:19:52 -05:00
kingbri
5fadaa728a
API: Move serviceinfo to core
...
Best to expose this endpoint to all APIs as its an information endpoint.
Signed-off-by: kingbri <bdashore3@proton.me >
2024-11-28 23:07:58 -05:00
DocShotgun
6f2dc2ea99
Grammar: Fix syntax, lint
2024-11-24 11:35:45 -08:00
DocShotgun
8f209efb99
Grammar: Clean up KBNF implementation
...
* Also remove empty cache clear function
2024-11-24 10:44:45 -08:00
randoentity
a52610fb19
workaround for tool calling
2024-11-24 13:40:33 +01:00
DocShotgun
a9f39bcff3
Grammar: Preliminary Formatron KBNF support
2024-11-23 12:05:41 -08:00
DocShotgun
0836a9317f
Grammar: Initial Formatron regex and JSON schema implementation
...
* Replace LMFE's regex and JSON schema filters with Formatron's
* Remove Outlines EBNF filter in preparation for Formatron KBNF filter
* TODO: Implement Formatron KBNF filter
2024-11-23 10:27:37 -08:00
kingbri
aa4ccd03d4
Infinity: Use a runtime type hint for engine
...
Remove the antipattern of the conditional type for the Async engine
and use string-based type inference.
Signed-off-by: kingbri <bdashore3@proton.me >
2024-11-22 18:06:08 -05:00
kingbri
242ff4f892
Dependencies: Fix OpenAPI generation
...
The vision module from the ExllamaV2 backend is used in files outside
the backends contained folder. Therefore, import ExllamaV2 as an
optional dependency here.
Signed-off-by: kingbri <bdashore3@proton.me >
2024-11-22 17:59:20 -05:00
kingbri
9cd7fcaf99
Pyproject: Add pillow to deps
...
Signed-off-by: kingbri <bdashore3@proton.me >
2024-11-22 17:48:56 -05:00
Brian
9c8186c138
Merge pull request #249 from theroyallab/vision
...
Vision
2024-11-22 17:45:49 -05:00
kingbri
388d36e6bd
OAI: Fix chat completion list parsing
...
The strings weren't being concatenated properly. Only add the combined
text if the chat completion type is a List.
Signed-off-by: kingbri <bdashore3@proton.me >
2024-11-22 17:30:29 -05:00
kingbri
eadc71a4c3
Model: Add unload and error messages for vision
...
If vision is enabled and the model doesn't support it, send an
error asking the user to reload. Also, add a method to unload the
vision tower.
Signed-off-by: kingbri <bdashore3@proton.me >
2024-11-22 14:25:03 -05:00
kingbri
c49047eea1
Model: Fix load packets
...
The model_type internal reference was changed to an enum for
a more extendable loading process. Return the current model type
when loading a new model.
Signed-off-by: kingbri <bdashore3@proton.me >
2024-11-21 18:06:47 -05:00
kingbri
0ab393f09c
Model: Set vision load to False by default
...
Mistake in unwrapping. Vision should be false to allow normal model
loading when the flag isn't provided.
Signed-off-by: kingbri <bdashore3@proton.me >
2024-11-21 17:54:42 -05:00
kingbri
902045edbb
API: Fix chat completion formatting flow
...
Previously, the flow for parsing chat completion messages and rendering
from the prompt template was disconnected between endpoints. Now, create
a common function to render and handle everything appropriately afterwards.
Signed-off-by: kingbri <bdashore3@proton.me >
2024-11-21 17:51:14 -05:00
kingbri
c652a6e030
API: Transform multimodal into an actual class
...
Migrate the add method into the class itself. Also, a BaseModel isn't
needed here since this isn't a serialized class.
Signed-off-by: kingbri <bdashore3@proton.me >
2024-11-20 00:06:20 -05:00
kingbri
8ffc636dce
OAI: Strictly type chat completions
...
Previously, the messages were a list of dicts. These are untyped
and don't provide strict hinting. Add types for chat completion
messages and reformat existing code.
Signed-off-by: kingbri <bdashore3@proton.me >
2024-11-19 23:18:18 -05:00
kingbri
0fadb1e5e8
Merge branch 'main' into vision
2024-11-19 21:19:21 -05:00
DocShotgun
731a345cfc
OAI: Keep behavior consistent between chat completion and encode
...
* When vision is not enabled, only the first text block is kept in message.content if it is a list
2024-11-19 12:40:00 -08:00
DocShotgun
27d9af50a8
API: Report whether vision is enabled
2024-11-19 12:29:25 -08:00
DocShotgun
5611365c07
OAI: Allow /v1/encode endpoint to handle vision requests
...
* More robust checks for OAI chat completion message lists on /v1/encode endpoint
* Added TODO to support other aspects of chat completions
* Fix oversight where embeddings was not defined in advance on /v1/chat/completions endpoint
2024-11-19 11:14:37 -08:00
DocShotgun
c42655336b
Config: Add option to disable fetching content from URLs
2024-11-17 23:05:17 -08:00
Brian
a69f86098a
Merge pull request #243 from DocShotgun/chunk-size-fix
...
Enforce chunk_size as multiple of 256
2024-11-18 00:40:36 -05:00