Files
ik_llama.cpp/ggml
Iwan Kawrakow f05484d9a3 FlashMLA-2: eliminate intermediate f32 tensors
This works on the CPU. PP performance is ~13% better for 16k tokens
and compute buffer is quite a bit smaller.
2025-03-12 10:45:36 +02:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00