1.7 KiB
🐛 #281 - Bug: Strange dips in TG performance
| Author | saood06 |
|---|---|
| State | ❌ Closed |
| Created | 2025-03-22 |
| Updated | 2025-03-23 |
Description
What happened?
As mentioned in https://github.com/ikawrakow/ik_llama.cpp/pull/273 I've seen this behavior occur with llama-server (sorry, I never really noted the configurations or models it occurs with), and I can usually mitigate it by canceling and then restarting generation until TG performance goes back to the expected value, the chart below shows this behavior captured in a benchmark.
Also I'm fairly certain I've never encountered this bug in batched-bench only in server and sweep-bench both of which manipulate the KV more than batched-bench.
Name and Version
Graph capturing this behavior was on 3d6e25c82d
What operating system are you seeing the problem on?
Linux
Relevant log output
💬 Conversation
👤 saood06 commented the 2025-03-23 at 13:11:13:
Closing via #282
PP performance for those options:
For my primary use case MLA-3 on is the best with nice PP and TG, it seems like though for tasks with very small PP and TG keeping context under 8K MLA-1 off is useful.
Thank you for the quick find and fix.