Slightly faster TG for split mode "graph" (#1057)

* Rearrange graph nodes So that we can do graph portions that are the same on 2 or more GPUs at the same time. * Separate graph compute implementation for split mode graph * This is better --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2026-04-29 02:41:47 +00:00 · 2025-12-12 07:54:37 +01:00
parent bf03f63c34
commit f65fefa36c
4 changed files with 183 additions and 91 deletions
--- a/ggml/include/ggml-backend.h
+++ b/ggml/include/ggml-backend.h
@@ -211,6 +211,7 @@ extern "C" {
    // enable or disable op offload for a given op
    GGML_API void                 ggml_backend_sched_set_op_offload(ggml_backend_sched_t sched, enum ggml_op op, bool on_or_off);
    GGML_API void                 ggml_backend_sched_set_only_active_experts(ggml_backend_sched_t sched, bool on_or_off);
+    GGML_API void                 ggml_backend_sched_set_split_mode_graph(ggml_backend_sched_t sched, bool on_or_off);

    //
    // Utils