Be able to set a max. number of GPUs to be used in split mode graph (#1051)

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
Kawrakow
2025-12-11 07:22:53 +01:00
committed by GitHub
parent 6a5a707ac0
commit 9484d150d8
6 changed files with 67 additions and 19 deletions

View File

@@ -362,6 +362,7 @@ extern "C" {
// LLAMA_SPLIT_ROW: the GPU that is used for small tensors and intermediate results
// LLAMA_SPLIT_LAYER: ignored
int32_t main_gpu;
int32_t max_gpu;
// proportion of the model (layers or rows) to offload to each GPU, size: llama_max_devices()
const float * tensor_split;