ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-02 18:10:02 +00:00

Files

Kawrakow a87e54db6e Flash MLA (CPU only) (#240 )

* FlashMLA - it finally works (on the CPU)

* FlashMLA: allow for f16 and bf16 cache in addition to q8_0

* It works with ggml FA, not with iqk FA

* WIP

* FlashMLA: it now works with iqk

I had forgotten to divide the Q stride by sizeof(float) and
that's why, very cobfusingly, it was working for TG but not for PP.

* WIP

* FlashMLA: that should be it for now

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2025-03-03 15:17:51 +02:00

CMakeLists.txt

Be able to repack tensors at run time (#147 )

2024-12-17 14:16:34 +01:00

llama-grammar.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

llama-grammar.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

llama-impl.h

Time to fix replace_all (#68 )