Files
ik_llama.cpp/ggml/src
Kawrakow 35374bc7e8 Metal implementatio for the trellis quants. (#475)
* iq2_kt: Metal dequantize

* iq2_kt: Metal GEMV

Performance is actually quite decent: 52 t/s on my M2-Max for LlaMA-3.1-8B

* iq3_kt: Metal dequantize

* iq3_kt: Metal GEMV

Performance is not as good as iq2_kt: 40 t/s on my M2-Max for LlaMA-3.1-8B.
Flipping signs is a costly affair.

* iq4_kt: Metal dequantize - getting NaNs

* iq4_kt: Metal GEMV - also not working

* iq4_kt: Metal still not working

* Disable iq4_kt on Metal for now

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-06-01 15:23:44 +03:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2025-05-20 17:03:14 +03:00
2024-08-12 15:14:32 +02:00