ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-10 16:30:12 +00:00

Files

Kawrakow a313b71bf8 DeepSeek FA optimizations (#929 )

* Use new-new-mma also for MLA=3, and use mask bounds

This gives us ~25% better PP at 32k tokens compared to main

* This seems better

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2025-11-10 09:55:30 +02:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

CUDA: set compute parameters via command line arguments (#910 )

2025-11-07 07:11:23 +02:00

src

DeepSeek FA optimizations (#929 )

2025-11-10 09:55:30 +02:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Disable CUDA fusion by default for now (#903 )

2025-11-05 10:58:12 +02:00