ik_llama.cpp/249 - CUDA_ results for MoE models are not reproducible.md at main - ik_llama.cpp

ikawrakow/ik_llama.cpp

Fork 0

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 09:09:50 +00:00

Files

Thomas eaa2510a28 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

1.1 KiB

Raw Permalink Blame History

📝 #249 - CUDA: results for MoE models are not reproducible

Author	`ikawrakow`
State	❌ Closed
Created	2025-03-10
Updated	2025-03-25

Description

What happened?

Running llama-perplexity with the same MoE model (observed with DeepSeek-Lite) produces different PPL values in each run.

The non-reproducibility is not observed for TG when using the same random seed.

Name and Version

All versions. The issue is also present in mainline llama.cpp (tested with latest as of today (build: 4858 (1e2f78a0)), so it is not due to a change I made. I think the non-reproducibility is due to this kernel, where the order in which the rows of the src1 tensor are copied to contiguous memory depends on how the stars have fallen today.

What operating system are you seeing the problem on?

No response

1.1 KiB Raw Permalink Blame History