2.6 KiB
🔀 #607 - vulkan: support softmax/FA batch and broadcast
| Author | firecoperana |
|---|---|
| State | ❌ Closed |
| Created | 2025-07-13 |
| Updated | 2025-07-16 |
Description
vulkan: support softmax/FA batch and broadcast https://github.com/ggml-org/llama.cpp/pull/14449 Fix gibberish output when FA is enabled for some model
The new FA for deepseek MLA PR is missing this, which caused gibberish output in some models.
- I have read the contributing guidelines
- Self-reported review complexity:
- Low
- Medium
- High
💬 Conversation
👤 ubergarm commented the 2025-07-13 at 19:09:26:
Great, this fixes the gibberish issue we were seeing over on #598 when I run with KHR_coopmat and -fa enabled:
ggml_vulkan: 0 = NVIDIA GeForce RTX 3090 Ti (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: KHR_coopmat
However, on the AMD GPU rig it no longer outputs that same looking gibberish, but now kinda chokes/freezes up around the same point where it used to throw gibberish. Then it very slowly outputs 3333
$ ./build/bin/llama-server --version
version: 3796 (69ab6921)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
ggml_vulkan: 0 = Radeon RX 7900 XTX (AMD open-source driver) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat
... For example, in French, numbers from to 10 are all irregular except for 11-16 which333^C
Response cancelled.
Also, I get a similar behavior where it starts out okay then goes to 33333 on my nvidia GPU when running with NV_coopmat2
ggml_vulkan: 0 = NVIDIA GeForce RTX 3090 Ti (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
...Maybe the user is learning French or needs it for a specific purpose. They might be preparing for a trip, studying, or33333333333333333333333333333333333333333333333333333333333333333333333333333333333^C
Response cancelled.
So this PR does seem to fix the NVIDIA KHR_coopmat -fa enabled path.
👤 firecoperana commented the 2025-07-13 at 23:46:43:
Can you try again?
👤 ikawrakow commented the 2025-07-15 at 06:04:07:
@firecoperana
Is this necessary after #608?
👤 firecoperana commented the 2025-07-15 at 12:30:20:
Already included in the main.