Files
ik_llama.cpp/ggml
Kawrakow a313b71bf8 DeepSeek FA optimizations (#929)
* Use new-new-mma also for MLA=3, and use mask bounds

This gives us ~25% better PP at 32k tokens compared to main

* This seems better

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-11-10 09:55:30 +02:00
..
2024-07-27 07:55:01 +02:00
2025-11-10 09:55:30 +02:00
2024-07-27 07:55:01 +02:00