Files
ik_llama.cpp/ggml/src/ggml-cuda
Kawrakow 86e2bec04e DeepSeek FA optimizations (#929)
* Use new-new-mma also for MLA=3, and use mask bounds

This gives us ~25% better PP at 32k tokens compared to main

* This seems better

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-11-10 09:55:30 +02:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2025-10-22 16:18:11 +03:00
2025-10-27 16:09:01 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2025-04-07 10:43:26 +02:00
2024-07-27 07:55:01 +02:00
2025-08-09 08:40:18 +03:00
2025-05-12 07:49:00 +03:00
2025-11-10 09:52:07 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-08-12 15:14:32 +02:00
2025-09-23 16:43:02 +02:00
2025-11-10 09:55:30 +02:00
2025-04-07 10:43:26 +02:00
2024-07-27 07:55:01 +02:00
2025-09-27 11:15:32 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2025-11-09 11:34:33 +02:00
2025-11-09 11:34:33 +02:00
2025-08-31 18:16:36 +03:00
2025-11-09 11:34:33 +02:00
2025-10-27 16:09:01 +02:00
2025-10-24 07:40:35 +03:00
2025-10-24 07:40:35 +03:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2025-11-09 11:34:33 +02:00
2025-11-09 11:34:33 +02:00
2025-11-03 18:42:20 +02:00
2024-07-27 07:55:01 +02:00
2025-04-07 10:43:26 +02:00
2025-04-07 10:43:26 +02:00
2025-04-07 10:43:26 +02:00
2025-10-22 16:18:11 +03:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2025-10-24 07:40:35 +03:00
2025-10-22 16:18:11 +03:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00