mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-04-30 11:21:56 +00:00
Better PP performance with split mode "graph" and 3+ GPUs (#1069)
* This should do the trick for PP * Command line option to set max. extra VRAM that the scheduler can use * Fix bug and cleanup * Looks like with this change it is working with tensor overrides * Nah, it is not working * OK, this seems to be working * Disable split scheduling with tensor overrides --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
@@ -212,6 +212,7 @@ extern "C" {
|
||||
GGML_API void ggml_backend_sched_set_op_offload(ggml_backend_sched_t sched, enum ggml_op op, bool on_or_off);
|
||||
GGML_API void ggml_backend_sched_set_only_active_experts(ggml_backend_sched_t sched, bool on_or_off);
|
||||
GGML_API void ggml_backend_sched_set_split_mode_graph(ggml_backend_sched_t sched, bool on_or_off);
|
||||
GGML_API void ggml_backend_sched_set_max_extra_alloc(ggml_backend_sched_t sched, int extra_alloc_MiB);
|
||||
|
||||
//
|
||||
// Utils
|
||||
|
||||
Reference in New Issue
Block a user