### 🔀 [#607](https://github.com/ikawrakow/ik_llama.cpp/pull/607) - vulkan: support softmax/FA batch and broadcast | **Author** | `firecoperana` | | :--- | :--- | | **State** | ❌ **Closed** | | **Created** | 2025-07-13 | | **Updated** | 2025-07-16 | --- #### Description vulkan: support softmax/FA batch and broadcast https://github.com/ggml-org/llama.cpp/pull/14449 Fix gibberish output when FA is enabled for some model The new FA for deepseek MLA PR is missing this, which caused gibberish output in some models. - [x] I have read the [contributing guidelines](https://github.com/ggerganov/llama.cpp/blob/master/CONTRIBUTING.md) - Self-reported review complexity: - [ ] Low - [x] Medium - [ ] High --- #### 💬 Conversation 👤 **ubergarm** commented the **2025-07-13** at **19:09:26**:
Great, this fixes the gibberish issue we were seeing over on #598 when I run with `KHR_coopmat` and `-fa` enabled: ``` ggml_vulkan: 0 = NVIDIA GeForce RTX 3090 Ti (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: KHR_coopmat ``` However, on the AMD GPU rig it no longer outputs that same looking gibberish, but now kinda chokes/freezes up around the same point where it used to throw gibberish. Then it very slowly outputs `3333` ``` $ ./build/bin/llama-server --version version: 3796 (69ab6921) built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu ggml_vulkan: 0 = Radeon RX 7900 XTX (AMD open-source driver) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat ... For example, in French, numbers from to 10 are all irregular except for 11-16 which333^C Response cancelled. ``` Also, I get a similar behavior where it starts out okay then goes to `33333` on my nvidia GPU when running with `NV_coopmat2` ```bash ggml_vulkan: 0 = NVIDIA GeForce RTX 3090 Ti (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2 ...Maybe the user is learning French or needs it for a specific purpose. They might be preparing for a trip, studying, or33333333333333333333333333333333333333333333333333333333333333333333333333333333333^C Response cancelled. ``` So this PR does seem to fix the NVIDIA `KHR_coopmat` `-fa` enabled path. --- 👤 **firecoperana** commented the **2025-07-13** at **23:46:43**:
Can you try again? --- 👤 **ikawrakow** commented the **2025-07-15** at **06:04:07**:
@firecoperana Is this necessary after #608? --- 👤 **firecoperana** commented the **2025-07-15** at **12:30:20**:
Already included in the main.