mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-04-30 11:21:56 +00:00
634 B
634 B
🔀 #90 - iq4_ks: faster dot product on Metal
| Author | ikawrakow |
|---|---|
| State | ❌ Closed |
| Created | 2024-10-16 |
| Updated | 2024-10-16 |
Description
Haha, I keep forgetting that the Metal compiler often needs a hand to produce fast code.
In this particular instance, we gain almost 8.5% token generation (TG) speedup for IQ4_KS:
TG-128(LLaMA-3.1-8B) goes to 52.5 t/s up from 48.4 t/s on my M2-Max 30-core GPU.
The actual computation did not change in any way, we just helped the compiler fetch data ore effectively.