tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-05-11 08:20:08 +00:00

Files

kingbri 871c89063d Model: Add Tensor Parallel support

Use the tensor parallel loader when the flag is enabled. The new loader
has its own autosplit implementation, so gpu_split_auto isn't valid
here.

Also make it easier to determine which cache type to use rather than
multiple if/else statements.

Signed-off-by: kingbri <bdashore3@proton.me>

2024-08-22 14:15:19 -04:00

exllamav2

Model: Add Tensor Parallel support

2024-08-22 14:15:19 -04:00

infinity

Embeddings: Update config, args, and parameter names

2024-07-30 15:32:26 -04:00