gpt-oss: WIP llama

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-28 09:04:10 +00:00

Model loads and runs (CPU only), but PPL is much to high
(~1500 for 1st batch vs ~200 in mainline).
Is it because of SWA, because of vocab, or did I introduce a bug somewhere?

This commit is contained in:

Iwan Kawrakow

2025-08-10 10:09:42 +03:00

parent e24a1d3eda

commit c69d04f324

2 changed files with 463 additions and 157 deletions

619

src/llama.cpp

View File

File diff suppressed because it is too large Load Diff

gpt-oss: WIP llama

619 src/llama.cpp View File

619

src/llama.cpp

View File