ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 17:20:01 +00:00

Files

Kawrakow f8b66238fa Fused matrix multiplications (CUDA and CPU) (#796 )

* Quick attempt to fuse the Q, K, V GEMMs

Doesn't do much on the CPU

* Doesn't do much on the GPU either

* Use llm_build_mul_mat_qkv

* This is not needed

* Revert timing on committed by mistake

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2025-09-24 16:52:54 +02:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

Offload only activated experts to the GPU (#698 )

2025-09-04 12:22:30 +02:00

src

Fused matrix multiplications (CUDA and CPU) (#796 )

2025-09-24 16:52:54 +02:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Set default value of GGML_SCHED_MAX_COPIES to 1 (#751 )

2025-09-02 07:04:39 +02:00