ik_llama.cpp/183 - Refactor_ iqk_mul_mat.md at 30381fc1fc6a302f9de0487b1e719f4efcc06a00 - ik_llama.cpp

ikawrakow/ik_llama.cpp

Fork 0

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 17:20:01 +00:00

Files

Thomas eaa2510a28 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

798 B

Raw Blame History

📝 #183 - Refactor: iqk_mul_mat

Author	`ikawrakow`
State	❌ Closed
Created	2025-01-30
Updated	2025-05-22

Description

Background Description

iqk_mul_mat.cpp compilation time has become unacceptably long. If I keep going that way soon it will rival CUDA build times.

As an experiment at some point I factored out the Flash Attention (FA) part from the matrix multiplication code. This resulted in a FA build time of ~45 seconds and GEMM/GEMV build time of ~30 seconds, so better than the ~75 seconds I observe for iqk_mul_mat.cpp on my Ryzen-7950X, but still far from really useful, so I did not commit.

Possible Refactor Approaches

No response

798 B Raw Blame History

📝 #183 - Refactor: iqk_mul_mat

Description

Background Description

Possible Refactor Approaches

798 B

Raw Blame History