mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-01-26 17:20:01 +00:00
* Copy reduce result to other GPUs if necessary * Avoid ggml_get_rows for TG * For the output ops use the result of the split that ran on the main GPU * More models