### 🐛 [#446](https://github.com/ikawrakow/ik_llama.cpp/pull/446) - Fix bug in MMVQ kernel
| **Author** | `ikawrakow` |
| :--- | :--- |
| **State** | ❌ **Closed** |
| **Created** | 2025-05-23 |
| **Updated** | 2025-05-24 |
---
#### Description
After a very long bug hunt, this PR should hopefully fix #389, #398, #425.
Thanks to everybody who tested my previous bug fix attempts!
Huge kudos to @ciprianveg who was instrumental in finding the bug!
The bug was in the CUDA matrix-vector multiplication kernel (a.k.a., MMVQ). It only shows up when the kernel processes 2 or 3 tokens. Hence, it was not observed during TG, and only showed up during PP when an expert in a MoE model ended up with having to process just 2 or 3 tokens from the batch (which is rare).
I believe all other changes I made in #442 are not necessary, but please test this PR to confirm.
Closes #389
Closes #398
Closes #425
---
#### 💬 Conversation
👤 **ciprianveg** commented the **2025-05-23** at **11:29:36**:
Thank you for the fix!🍻
On Fri, 23 May 2025, 12:17 Kawrakow, ***@***.***> wrote:
> After a very long bug hunt, this PR should hopefully fix #389
> , #398
> , #425
> .
>
> Thanks to everybody who tested my previous bug fix attempts!
> Huge kudos to @ciprianveg who was
> instrumental in finding the bug!
>
> The bug was in the CUDA matrix-vector multiplication kernel (a.k.a.,
> MMVQ). It only shows up when the kernel processes 2 or 3 tokens. Hence, it
> was not observed during TG, and only showed up during PP when an expert in
> a MoE model ended up with having to process just 2 or 3 tokens from the
> batch (which is rare).
>
> I believe all other changes I made in #442
> are not necessary,
> but please test this PR to confirm.
>
> Closes #389
> Closes #398
> Closes #425
> ------------------------------
> You can view, comment on, or merge this pull request online at:
>
> https://github.com/ikawrakow/ik_llama.cpp/pull/446
> Commit Summary
>
> - 193a15b
>
> Fix bug in MMVQ kernel
>
> File Changes
>
> (1 file )
>
> - *M* ggml/src/ggml-cuda/mmvq.cu
>
> (5)
>
> Patch Links:
>
> - https://github.com/ikawrakow/ik_llama.cpp/pull/446.patch
> - https://github.com/ikawrakow/ik_llama.cpp/pull/446.diff
>
> —
> Reply to this email directly, view it on GitHub
> , or unsubscribe
>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
---
👤 **ikawrakow** commented the **2025-05-23** at **15:25:05**:
I think I'll merge this now. It fixes a real bug, so it should be merged irrespective of it fixing #389, #398, #425.
---
👤 **Panchovix** commented the **2025-05-23** at **16:00:18**:
Amazing, thanks for all your work!
---
👤 **p4s2wd** commented the **2025-05-24** at **05:12:04**:
Thank you!
---
👤 **pt13762104** commented the **2025-05-24** at **09:31:08**:
It's working fine now, thank you for your patience