ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-05-11 08:30:19 +00:00

Files

Kawrakow cafeef484c More Qwen3-Next optimizations (#1277 )

* Optimizing q3next TG

* Fused add -> softplus -> mul on CUDA

* Remove forgotten debug log

* Increase ggml context size

Required for Qwen3-Next with batch/u-batch size of 4096

* WIP

* Avoid some contiguous ops

* Avoid some repeats

* Avoid some more repeats

2026-02-17 16:03:51 +01:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

WIP: Qwen3Next (#1266 )

2026-02-16 06:50:28 +01:00

src

More Qwen3-Next optimizations (#1277 )

2026-02-17 16:03:51 +01:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Remove llamafile remnants (#1179 )

2026-01-22 13:20:23 +02:00