ik_llama.cpp/227 - Prevent FA usage on CUDA when K and V head sizes are different.md at eaa2510a28b60d43c2210c69cefdf750d5cc119f - ik_llama.cpp

ikawrakow/ik_llama.cpp

Fork 0

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 09:09:50 +00:00

Files

Thomas eaa2510a28 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

893 B

Raw Blame History

📝 #227 - Prevent FA usage on CUDA when K and V head sizes are different

Author	`ikawrakow`
State	❌ Closed
Created	2025-02-23
Updated	2025-03-20

Description

CUDA FA is not implemented when K and V head sizes are different (e.g., DeepSeekV3/R1/Lite), and leads to random error messages being displayed to the user or garbage output. Since the user may not know this detail, it is better to prevent CUDA FA usage in such cases.

💬 Conversation

👤 saood06 commented the 2025-03-20 at 01:41:17:

Can this be closed now, I think https://github.com/ikawrakow/ik_llama.cpp/pull/268 handled the only case left where CUDA was not supported.

👤 ikawrakow commented the 2025-03-20 at 16:33:31:

Yes, closing it.

893 B Raw Blame History

📝 #227 - Prevent FA usage on CUDA when K and V head sizes are different

Description

💬 Conversation

893 B

Raw Blame History