Files
ik_llama.cpp/github-data/issues/281 - Bug_ Strange dips in TG performance.md
2025-07-23 13:31:53 +02:00

1.7 KiB

🐛 #281 - Bug: Strange dips in TG performance

Author saood06
State Closed
Created 2025-03-22
Updated 2025-03-23

Description

What happened?

As mentioned in https://github.com/ikawrakow/ik_llama.cpp/pull/273 I've seen this behavior occur with llama-server (sorry, I never really noted the configurations or models it occurs with), and I can usually mitigate it by canceling and then restarting generation until TG performance goes back to the expected value, the chart below shows this behavior captured in a benchmark.

Image

Also I'm fairly certain I've never encountered this bug in batched-bench only in server and sweep-bench both of which manipulate the KV more than batched-bench.

Name and Version

Graph capturing this behavior was on 3d6e25c82d

What operating system are you seeing the problem on?

Linux

Relevant log output



💬 Conversation

👤 saood06 commented the 2025-03-23 at 13:11:13:

Closing via #282

Image

PP performance for those options:

Image

For my primary use case MLA-3 on is the best with nice PP and TG, it seems like though for tasks with very small PP and TG keeping context under 8K MLA-1 off is useful.

Thank you for the quick find and fix.