Files
ik_llama.cpp/github-data/pull_requests/62 - Use fp32 for K_Q in Metal FA implementation.md
2025-07-23 13:31:53 +02:00

491 B

🔀 #62 - Use fp32 for K*Q in Metal FA implementation

Author ikawrakow
State Closed
Created 2024-09-25
Updated 2024-09-25

Description

Else some models (e.g., Qwen2-7B-Instruct) produce garbage. Borrowed from PR-9595 in mainline llama.cpp.

Strangely enough, K*Q is done using fp16 in my ARM_NEON FA implementation, and it works just fine there.