mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-03-10 22:10:20 +00:00
* WIP * WIP * WIP * WIP * WIP * WIP * WIP Loads and starts running, crashes with illegal memory access in quantize_mmq_q8_1. This almost always indicates NaNs in the input to the MoE FFN part. * WIP * WIP Loads and runs, wrong results (very high PPL) Performance looks promising, around 25% better than previous sm graph. Needs f32 or bf16 graph reduce type. * WIP - still wrong * Fix after rebase * WIP * WIP * This seems to be working for dense Qwen3.5!!! * WIP: Qwen3-Next is not quite working * Some cleanup * Disable Qwen3-Next for now * Disable graph parallel when mmproj was specified * Read/write split recurrent state * That should not crash * Re-enable vision - it works now * Recurrent layers should now be counted for split cache