mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-02-22 06:04:24 +00:00
Split mode "graph" for Cohere2 (#1061)
* This works and TG is descent, but PP is low * Better * Apply f_logit_scale before mul mat with output tensor * This is better for PP: 600 t/s -> 700 t/s * To not lose this again * WIP * Equal split * WIP --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
@@ -1729,6 +1729,7 @@ static bool is_model_split_supported(const llama_model & model) {
|
||||
LLM_ARCH_QWEN3MOE,
|
||||
LLM_ARCH_GLM4_MOE,
|
||||
LLM_ARCH_MISTRAL3,
|
||||
LLM_ARCH_COHERE2,
|
||||
};
|
||||
auto it = k_supported.find(model.arch);
|
||||
return it != k_supported.end();
|
||||
|
||||
Reference in New Issue
Block a user