ik_llama.cpp/281 - Bug_ Strange dips in TG performance.md at main - ik_llama.cpp

ikawrakow/ik_llama.cpp

Fork 0

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 09:09:50 +00:00

Files

Thomas eaa2510a28 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

1.7 KiB

Raw Permalink Blame History

🐛 #281 - Bug: Strange dips in TG performance

Author	`saood06`
State	❌ Closed
Created	2025-03-22
Updated	2025-03-23

Description

What happened?

As mentioned in https://github.com/ikawrakow/ik_llama.cpp/pull/273 I've seen this behavior occur with llama-server (sorry, I never really noted the configurations or models it occurs with), and I can usually mitigate it by canceling and then restarting generation until TG performance goes back to the expected value, the chart below shows this behavior captured in a benchmark.

Also I'm fairly certain I've never encountered this bug in batched-bench only in server and sweep-bench both of which manipulate the KV more than batched-bench.

Name and Version

Graph capturing this behavior was on 3d6e25c82d

What operating system are you seeing the problem on?

Linux

Relevant log output

💬 Conversation

👤 saood06 commented the 2025-03-23 at 13:11:13:

Closing via #282

PP performance for those options:

For my primary use case MLA-3 on is the best with nice PP and TG, it seems like though for tasks with very small PP and TG keeping context under 8K MLA-1 off is useful.

Thank you for the quick find and fix.

1.7 KiB Raw Permalink Blame History