ik_llama.cpp/github-data/pull_requests/62 - Use fp32 for K_Q in Metal FA implementation.md at fc06bc9d2720efbb2bfaab25114ca51cfb39a41a - ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-30 11:21:56 +00:00

Files

Thomas 0451f10a42 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

Else some models (e.g., Qwen2-7B-Instruct) produce garbage. Borrowed from PR-9595 in mainline llama.cpp.

Strangely enough, K*Q is done using fp16 in my ARM_NEON FA implementation, and it works just fine there.