Files
ik_llama.cpp/ggml
Kawrakow f90b4c2f27 Full graph parallel for Qwen3.5 (dense and MoE) (#1388)
* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

Loads and starts running, crashes with illegal memory access in
quantize_mmq_q8_1. This almost always indicates NaNs in the input
to the MoE FFN part.

* WIP

* WIP

Loads and runs, wrong results (very high PPL)
Performance looks promising, around 25% better than previous sm graph.
Needs f32 or bf16 graph reduce type.

* WIP - still wrong

* Fix after rebase

* WIP

* WIP

* This seems to be working for dense Qwen3.5!!!

* WIP: Qwen3-Next is not quite working

* Some cleanup

* Disable Qwen3-Next for now

* Disable graph parallel when mmproj was specified

* Read/write split recurrent state

* That should not crash

* Re-enable vision - it works now

* Recurrent layers should now be counted for split cache
2026-03-10 09:08:24 +01:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00