Files
ik_llama.cpp/github-data/issues/249 - CUDA_ results for MoE models are not reproducible.md
2025-07-23 13:31:53 +02:00

32 lines
1.1 KiB
Markdown

### 📝 [#249](https://github.com/ikawrakow/ik_llama.cpp/issues/249) - CUDA: results for MoE models are not reproducible
| **Author** | `ikawrakow` |
| :--- | :--- |
| **State** | ❌ **Closed** |
| **Created** | 2025-03-10 |
| **Updated** | 2025-03-25 |
---
#### Description
### What happened?
Running `llama-perplexity` with the same MoE model (observed with DeepSeek-Lite) produces different PPL values in each run.
The non-reproducibility is not observed for TG when using the same random seed.
### Name and Version
All versions. The issue is also present in mainline `llama.cpp` (tested with latest as of today (`build: 4858 (1e2f78a0)`), so it is not due to a change I made. I think the non-reproducibility is due to [this kernel](https://github.com/ikawrakow/ik_llama.cpp/blob/b096a5de7a9bdf516bb20729d5d0a3b2a12cba2f/ggml/src/ggml-cuda.cu#L2039), where the order in which the rows of the `src1` tensor are copied to contiguous memory depends on how the stars have fallen today.
### What operating system are you seeing the problem on?
_No response_
### Relevant log output
```shell
```