ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-25 07:34:10 +00:00

Files

Iwan Kawrakow f05484d9a3 FlashMLA-2: eliminate intermediate f32 tensors

This works on the CPU. PP performance is ~13% better for 16k tokens
and compute buffer is quite a bit smaller.

2025-03-12 10:45:36 +02:00

2024-07-27 07:55:01 +02:00

2025-03-02 13:47:38 +02:00

2025-03-12 10:45:36 +02:00

.gitignore

2024-07-27 07:55:01 +02:00

CMakeLists.txt

2025-02-09 18:59:33 +02:00