Use standard attention for Ministral3 (#1032)

Required adding the "temperature scaling" to the standard attention
implementation.

But in this way split mode "graph" is automatically supported.

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
Kawrakow
2025-12-03 13:43:31 +01:00
committed by GitHub
parent 7fbe8d3ac2
commit 90f36eb517
3 changed files with 18 additions and 50 deletions

View File

@@ -1728,6 +1728,7 @@ static bool is_model_split_supported(const llama_model & model) {
LLM_ARCH_LLAMA,
LLM_ARCH_QWEN3MOE,
LLM_ARCH_GLM4_MOE,
LLM_ARCH_MISTRAL3,
};
auto it = k_supported.find(model.arch);
return it != k_supported.end();