gpt-oss: WIP llama

Model loads and runs (CPU only), but PPL is much to high (~1500 for 1st batch vs ~200 in mainline). Is it because of SWA, because of vocab, or did I introduce a bug somewhere?
2026-05-11 00:20:19 +00:00 · 2025-08-10 10:09:42 +03:00
parent e24a1d3eda
commit c69d04f324
2 changed files with 463 additions and 157 deletions
--- a/ggml/src/ggml.c
+++ b/ggml/src/ggml.c
@@ -22067,6 +22067,7 @@ static int ggml_get_n_tasks(struct ggml_tensor * node, int n_threads) {
                case GGML_UNARY_OP_GELU_QUICK:
                case GGML_UNARY_OP_SILU:
                case GGML_UNARY_OP_SWIGLU:
+                case GGML_UNARY_OP_SWIGLU_OAI:
                    {
                        n_tasks = n_threads;
                    } break;
--- a/src/llama.cpp
+++ b/src/llama.cpp