mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-02-23 06:34:13 +00:00
* Optimizing q3next TG * Fused add -> softplus -> mul on CUDA * Remove forgotten debug log * Increase ggml context size Required for Qwen3-Next with batch/u-batch size of 4096 * WIP * Avoid some contiguous ops * Avoid some repeats * Avoid some more repeats