ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-10 22:10:20 +00:00

Files

Kawrakow f90b4c2f27 Full graph parallel for Qwen3.5 (dense and MoE) (#1388 )

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

Loads and starts running, crashes with illegal memory access in
quantize_mmq_q8_1. This almost always indicates NaNs in the input
to the MoE FFN part.

* WIP

* WIP

Loads and runs, wrong results (very high PPL)
Performance looks promising, around 25% better than previous sm graph.
Needs f32 or bf16 graph reduce type.

* WIP - still wrong

* Fix after rebase

* WIP

* WIP

* This seems to be working for dense Qwen3.5!!!

* WIP: Qwen3-Next is not quite working

* Some cleanup

* Disable Qwen3-Next for now

* Disable graph parallel when mmproj was specified

* Read/write split recurrent state

* That should not crash

* Re-enable vision - it works now

* Recurrent layers should now be counted for split cache

2026-03-10 09:08:24 +01:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

Fix clang warnings on macOS (#1354 )

2026-03-03 16:27:16 +01:00

src

Full graph parallel for Qwen3.5 (dense and MoE) (#1388 )

2026-03-10 09:08:24 +01:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Remove llamafile remnants (#1179 )

2026-01-22 13:20:23 +02:00