ik_llama.cpp/364 - Fix FA bug on AVX2.md at main - ik_llama.cpp

ikawrakow/ik_llama.cpp

Fork 0

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 09:09:50 +00:00

Files

Thomas eaa2510a28 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

1.1 KiB

Raw Permalink Blame History

🐛 #364 - Fix FA bug on AVX2

Author	`ikawrakow`
State	❌ Closed
Created	2025-05-01
Updated	2025-05-02

Description

The bug was quite subtle: we have Q8_0 K-cache, so we need to quantize the Q tensor to the appropriate quantization type (vec_dot_type in ggml lingo) that differs from platform to platform. We pick correctly the type. But then we notice that it is a GQA case, so we repack the K tensor to Q8_0_R8 for faster processing, but still use the vec_dot_type selected based on K being Q8_0. On Zen4 and ARM_NEON the vet_dot_type is the same, so everything works fine. But on AVX2 the vec_dot_type changes, and we get gibberish (or even an assert for a NaN value).

The bug was introduced in my recent CPU FA optimization round (#351)

Closes #363

💬 Conversation

👤 ikawrakow commented the 2025-05-02 at 05:09:05:

It looks like this does not fully fix #363, but I'll merge it to not have 2 real bugs stay on the main branch.

1.1 KiB Raw Permalink Blame History

🐛 #364 - Fix FA bug on AVX2

Description

💬 Conversation

1.1 KiB

Raw Permalink Blame History