Files
ik_llama.cpp/ggml
Kawrakow bdf4f0ddce Even more fused ops (#868)
* Fuse Q, K, V gemv+add

* More gemv+add fusing

* Faster copy when tensors are contiguous

Relevant for storing data into the KV cache. I see ~1% speedup
for fast models (Ling-mini-2.0, gpt-oss-20b, etc.)

* Cleanup

* Make sure the bias really is 1 row to use fusion

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-10-27 16:09:01 +02:00
..
2024-07-27 07:55:01 +02:00
2025-10-24 07:40:35 +03:00
2025-10-27 16:09:01 +02:00
2024-07-27 07:55:01 +02:00