mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-04-29 02:41:47 +00:00
Slightly faster TG for split mode "graph" (#1057)
* Rearrange graph nodes So that we can do graph portions that are the same on 2 or more GPUs at the same time. * Separate graph compute implementation for split mode graph * This is better --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
@@ -211,6 +211,7 @@ extern "C" {
|
||||
// enable or disable op offload for a given op
|
||||
GGML_API void ggml_backend_sched_set_op_offload(ggml_backend_sched_t sched, enum ggml_op op, bool on_or_off);
|
||||
GGML_API void ggml_backend_sched_set_only_active_experts(ggml_backend_sched_t sched, bool on_or_off);
|
||||
GGML_API void ggml_backend_sched_set_split_mode_graph(ggml_backend_sched_t sched, bool on_or_off);
|
||||
|
||||
//
|
||||
// Utils
|
||||
|
||||
Reference in New Issue
Block a user