ik_llama.cpp/github-data/pull_requests/607 - vulkan_ support softmax_FA batch and broadcast.md at ca03d07bb6bcd8efa60b2b7ebab3d282a21decd3 - ik_llama.cpp

ikawrakow/ik_llama.cpp

Fork 0

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-23 16:09:18 +00:00

Files

Thomas 0451f10a42 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

2.6 KiB

Raw Blame History

🔀 #607 - vulkan: support softmax/FA batch and broadcast

Author	`firecoperana`
State	❌ Closed
Created	2025-07-13
Updated	2025-07-16

Description

vulkan: support softmax/FA batch and broadcast https://github.com/ggml-org/llama.cpp/pull/14449 Fix gibberish output when FA is enabled for some model

The new FA for deepseek MLA PR is missing this, which caused gibberish output in some models.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

💬 Conversation

👤 ubergarm commented the 2025-07-13 at 19:09:26:

Great, this fixes the gibberish issue we were seeing over on #598 when I run with KHR_coopmat and -fa enabled:

ggml_vulkan: 0 = NVIDIA GeForce RTX 3090 Ti (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: KHR_coopmat

However, on the AMD GPU rig it no longer outputs that same looking gibberish, but now kinda chokes/freezes up around the same point where it used to throw gibberish. Then it very slowly outputs 3333

$ ./build/bin/llama-server --version
version: 3796 (69ab6921)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

ggml_vulkan: 0 = Radeon RX 7900 XTX (AMD open-source driver) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat

... For example, in French, numbers from  to 10 are all irregular except for 11-16 which333^C
Response cancelled.

Also, I get a similar behavior where it starts out okay then goes to 33333 on my nvidia GPU when running with NV_coopmat2

ggml_vulkan: 0 = NVIDIA GeForce RTX 3090 Ti (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2

...Maybe the user is learning French or needs it for a specific purpose. They might be preparing for a trip, studying, or33333333333333333333333333333333333333333333333333333333333333333333333333333333333^C
Response cancelled.

So this PR does seem to fix the NVIDIA KHR_coopmat -fa enabled path.

👤 firecoperana commented the 2025-07-13 at 23:46:43:

Can you try again?

👤 ikawrakow commented the 2025-07-15 at 06:04:07:

@firecoperana

Is this necessary after #608?

👤 firecoperana commented the 2025-07-15 at 12:30:20:

Already included in the main.

2.6 KiB Raw Blame History

🔀 #607 - vulkan: support softmax/FA batch and broadcast

Description

💬 Conversation

2.6 KiB

Raw Blame History