ik_llama.cpp/87 - iq3_k_ fix and optimize Metal dot product.md at main - ik_llama.cpp

ikawrakow/ik_llama.cpp

Fork 0

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 17:20:01 +00:00

Files

Thomas eaa2510a28 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

684 B

Raw Permalink Blame History

🐛 #87 - iq3_k: fix and optimize Metal dot product

Author	`ikawrakow`
State	❌ Closed
Created	2024-10-14
Updated	2024-10-14

Description

I was accessing the scales as 4-byte aligned, but IQ3_K is not 4-byte aligned. Instead of throwing an error (as it happens on CUDA when one makes a mistake such as this), Metal silently accepts and we get garbage. But we don't get garbage right away so one can easily notice, no we get garbage after some tokens have been generated.

PR also makes a minor optimization of the Metal dot product (~2.5% speedup).

684 B Raw Permalink Blame History

🐛 #87 - iq3_k: fix and optimize Metal dot product

Description

684 B

Raw Permalink Blame History