Mimo-V2-Flash support (#1096)

* Mimo-2 support * Fix bug for head sizes not being the same It still does not solve the Mimo-2 quantized cache issue. * Fix quantized cache * Minor --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2026-02-23 06:34:13 +00:00 · 2026-01-05 08:00:01 +02:00
parent 1401326916
commit 8a6622eb4f
12 changed files with 251 additions and 54 deletions
--- a/src/llama.cpp
+++ b/src/llama.cpp
@@ -4905,6 +4905,7 @@ enum llama_rope_type llama_rope_type(const struct llama_model * model) {
        case LLM_ARCH_OPENAI_MOE:
        case LLM_ARCH_BAILINGMOE2:
        case LLM_ARCH_MINIMAX_M2:
+        case LLM_ARCH_MIMO2:
            return LLAMA_ROPE_TYPE_NEOX;

        case LLM_ARCH_QWEN2VL: