mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-04-21 23:19:22 +00:00
Slightly faster TG for split mode "graph" (#1057)
* Rearrange graph nodes So that we can do graph portions that are the same on 2 or more GPUs at the same time. * Separate graph compute implementation for split mode graph * This is better --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
@@ -1228,6 +1228,7 @@ llm_expert_gating_func_type gating_op,
|
||||
cur = ggml_cast(ctx, cur, GGML_TYPE_F16);
|
||||
cb(cur, "ffn_out_f16", il_cb);
|
||||
}
|
||||
ggml_build_forward_expand(graph, routed_out);
|
||||
results.push_back(cur);
|
||||
}
|
||||
GGML_ASSERT(!results.empty());
|
||||
|
||||
Reference in New Issue
Block a user