ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-23 14:44:09 +00:00

Files

Iwan Kawrakow 9a790a8905 Introducing rope cache

When computing RoPE, the rotation angles in each layer
are exactly the same, and only depend on the token positions
(and other constant, model dependent parameters).
So, I wonder, why don't we compute the angles just once
and then reuse for the Q and K RoPE in each layer?

This commit does it as a POC on the CPU, and uses it in
the Qwen3-MoE compute graph.

2025-11-03 08:30:32 +02:00

ggml-alloc.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-backend.h

Offload only activated experts to the GPU (#698 )

2025-09-04 12:22:30 +02:00

ggml-blas.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-cann.h

Merge mainline llama.cpp (#3 )