mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-04-26 01:19:20 +00:00
Offload only activated experts to the GPU (#698)
* Offload only activated experts * This seems to do the trick for -fmoe * Do not recalculate activated expers for fused up/gate * Log out of bounds access details * Add a command line argument --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
2439
ggml/src/ggml-backend.cpp
Normal file
2439
ggml/src/ggml-backend.cpp
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user