gpt-oss: WIP llama

Model loads and runs (CPU only), but PPL is much to high
(~1500 for 1st batch vs ~200 in mainline).
Is it because of SWA, because of vocab, or did I introduce a bug somewhere?
This commit is contained in:
Iwan Kawrakow
2025-08-10 10:09:42 +03:00
parent e24a1d3eda
commit c69d04f324
2 changed files with 463 additions and 157 deletions

View File

@@ -22067,6 +22067,7 @@ static int ggml_get_n_tasks(struct ggml_tensor * node, int n_threads) {
case GGML_UNARY_OP_GELU_QUICK:
case GGML_UNARY_OP_SILU:
case GGML_UNARY_OP_SWIGLU:
case GGML_UNARY_OP_SWIGLU_OAI:
{
n_tasks = n_threads;
} break;

File diff suppressed because it is too large Load Diff