ik_llama.cpp/github-data/pull_requests/62 - Use fp32 for K_Q in Metal FA implementation.md at 5a633bb0e95a3e7148ea4abbdf7d2ce585f113b4 - ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-30 11:21:56 +00:00

Files

Thomas eaa2510a28 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

Else some models (e.g., Qwen2-7B-Instruct) produce garbage. Borrowed from PR-9595 in mainline llama.cpp.

Strangely enough, K*Q is done using fp16 in my ARM_NEON FA implementation, and it works just fine there.