tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-04-20 06:19:15 +00:00

Files

kingbri 871c89063d Model: Add Tensor Parallel support

Use the tensor parallel loader when the flag is enabled. The new loader
has its own autosplit implementation, so gpu_split_auto isn't valid
here.

Also make it easier to determine which cache type to use rather than
multiple if/else statements.

Signed-off-by: kingbri <bdashore3@proton.me>

2024-08-22 14:15:19 -04:00

args.py

Model: Add Tensor Parallel support

2024-08-22 14:15:19 -04:00

auth.py

Auth: Fix disable auth when checking for key permissions

2024-07-26 15:04:29 -04:00

concurrency.py

API + Model: Add blocks and checks for various load requests

2024-05-25 21:16:14 -04:00

config.py

Embeddings: Update config, args, and parameter names