Files
ik_llama.cpp/ggml
Kawrakow 0c02e16a39 Faster DeepSeek FA on CUDA (#408)
* New DeepSeek FlashMLA

Does not work because the RoPE portion is stored at the end
in our case, while in mainline it is stored at the beginning,
and the FA kernel assumes that.

* Rearrange MLA K cache so it first new CUDA FA implementation

* constexpr and minor changes

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-05-12 07:49:00 +03:00
..
2024-07-27 07:55:01 +02:00
2025-05-12 07:47:46 +03:00
2025-05-12 07:49:00 +03:00
2024-07-27 07:55:01 +02:00