ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-20 05:04:11 +00:00

Files

Kawrakow 35374bc7e8 Metal implementatio for the trellis quants. (#475 )

* iq2_kt: Metal dequantize

* iq2_kt: Metal GEMV

Performance is actually quite decent: 52 t/s on my M2-Max for LlaMA-3.1-8B

* iq3_kt: Metal dequantize

* iq3_kt: Metal GEMV

Performance is not as good as iq2_kt: 40 t/s on my M2-Max for LlaMA-3.1-8B.
Flipping signs is a costly affair.

* iq4_kt: Metal dequantize - getting NaNs

* iq4_kt: Metal GEMV - also not working

* iq4_kt: Metal still not working

* Disable iq4_kt on Metal for now

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2025-06-01 15:23:44 +03:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

Trellis quants with CPU inference (#441 )

2025-05-23 09:17:52 +03:00

src

Metal implementatio for the trellis quants. (#475 )

2025-06-01 15:23:44 +03:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Option to enable disable the IQK CPU FA kernels (#429 )

2025-05-17 11:21:58 +03:00