Offload only activated experts to the GPU (#698)

* Offload only activated experts

* This seems to do the trick for -fmoe

* Do not recalculate activated expers for fused up/gate

* Log out of bounds access details

* Add a command line argument

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
Kawrakow
2025-09-04 12:22:30 +02:00
committed by GitHub
parent 144d456717
commit 13c3b6412e
8 changed files with 155 additions and 45 deletions

View File

@@ -210,6 +210,7 @@ extern "C" {
// enable or disable op offload for a given op
GGML_API void ggml_backend_sched_set_op_offload(ggml_backend_sched_t sched, enum ggml_op op, bool on_or_off);
GGML_API void ggml_backend_sched_set_only_active_experts(ggml_backend_sched_t sched, bool on_or_off);
//
// Utils