ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-06 03:50:08 +00:00

Files

Kawrakow 250c325e7e iq3_k: fix and optimize Metal dot product (#87 )

* iq3_k: fix Metal dot product

I was accessing the scales as 4-byte aligned, but iq3_k is
not 4-byte aligned. Instead of throwing an error (as it happens
on CUDA when one makes this mistake), Metal silently accepts
and we get garbage.

* iq3_k: slightly faster Metal dot product

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2024-10-14 10:46:41 +03:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

IQ2_KS: 2.1875 bpw non-linear quantization (#85 )

2024-10-13 13:34:30 +03:00

src

iq3_k: fix and optimize Metal dot product (#87 )

2024-10-14 10:46:41 +03:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Move to c++17 projectwide (#80 )

2024-10-04 14:43:26 +03:00