ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-21 15:09:40 +00:00

Files

Kawrakow 6970ef925f CUDA: small PP performance improvement for MoE models (#589 )

* Trying to implement quantized fmoe - not working yet

* This works, but is slower than the non-working version

* quantize_mmq_q8_1_id

* Minor

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2025-07-07 07:23:12 +02:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

Chnage KQ mask padding to 64 (#574 )

2025-07-03 10:43:27 +02:00

src

CUDA: small PP performance improvement for MoE models (#589 )

2025-07-07 07:23:12 +02:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Merge vulkan code from mainline up to commit of 6/28/2025 (#563 )

2025-07-02 08:49:42 +02:00