mirror of
https://github.com/theroyallab/tabbyAPI.git
synced 2026-04-26 09:18:53 +00:00
Model: Add Tensor Parallel support
Use the tensor parallel loader when the flag is enabled. The new loader has its own autosplit implementation, so gpu_split_auto isn't valid here. Also make it easier to determine which cache type to use rather than multiple if/else statements. Signed-off-by: kingbri <bdashore3@proton.me>
This commit is contained in:
@@ -107,6 +107,11 @@ def add_model_args(parser: argparse.ArgumentParser):
|
||||
type=str_to_bool,
|
||||
help="Overrides base model context length",
|
||||
)
|
||||
model_group.add_argument(
|
||||
"--tensor-parallel",
|
||||
type=str_to_bool,
|
||||
help="Use tensor parallelism to load models",
|
||||
)
|
||||
model_group.add_argument(
|
||||
"--gpu-split-auto",
|
||||
type=str_to_bool,
|
||||
|
||||
Reference in New Issue
Block a user