Be able to set a max. number of GPUs to be used in split mode graph (#1051)

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2026-02-20 21:24:08 +00:00 · 2025-12-11 07:22:53 +01:00
parent 6a5a707ac0
commit 9484d150d8
6 changed files with 67 additions and 19 deletions
--- a/include/llama.h
+++ b/include/llama.h
@@ -362,6 +362,7 @@ extern "C" {
        // LLAMA_SPLIT_ROW: the GPU that is used for small tensors and intermediate results
        // LLAMA_SPLIT_LAYER: ignored
        int32_t main_gpu;
+        int32_t max_gpu;

        // proportion of the model (layers or rows) to offload to each GPU, size: llama_max_devices()
        const float * tensor_split;