gpt-oss: WIP llama

Model loads and runs (CPU only), but PPL is much to high
(~1500 for 1st batch vs ~200 in mainline).
Is it because of SWA, because of vocab, or did I introduce a bug somewhere?
This commit is contained in:
Iwan Kawrakow
2025-08-10 10:09:42 +03:00
parent e24a1d3eda
commit c69d04f324
2 changed files with 463 additions and 157 deletions

File diff suppressed because it is too large Load Diff