ik_llama.cpp/github-data/pull_requests/90 - iq4_ks_ faster dot product on Metal.md at 9defcebeccf280788cc03fa06d64ad5e8eaee9b4 - ik_llama.cpp

ikawrakow/ik_llama.cpp

Fork 0

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-30 11:21:56 +00:00

Files

Thomas 0451f10a42 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

634 B

Raw Blame History

🔀 #90 - iq4_ks: faster dot product on Metal

Author	`ikawrakow`
State	❌ Closed
Created	2024-10-16
Updated	2024-10-16

Description

Haha, I keep forgetting that the Metal compiler often needs a hand to produce fast code. In this particular instance, we gain almost 8.5% token generation (TG) speedup for IQ4_KS: TG-128(LLaMA-3.1-8B) goes to 52.5 t/s up from 48.4 t/s on my M2-Max 30-core GPU. The actual computation did not change in any way, we just helped the compiler fetch data ore effectively.

634 B Raw Blame History

🔀 #90 - iq4_ks: faster dot product on Metal

Description

634 B

Raw Blame History