mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-05-03 21:01:43 +00:00
Add GitHub data: filename sanitization (#640)
This commit is contained in:
@@ -0,0 +1,51 @@
|
||||
### 🐛 [#281](https://github.com/ikawrakow/ik_llama.cpp/issues/281) - Bug: Strange dips in TG performance
|
||||
|
||||
| **Author** | `saood06` |
|
||||
| :--- | :--- |
|
||||
| **State** | ❌ **Closed** |
|
||||
| **Created** | 2025-03-22 |
|
||||
| **Updated** | 2025-03-23 |
|
||||
|
||||
---
|
||||
|
||||
#### Description
|
||||
|
||||
### What happened?
|
||||
|
||||
As mentioned in https://github.com/ikawrakow/ik_llama.cpp/pull/273 I've seen this behavior occur with llama-server (sorry, I never really noted the configurations or models it occurs with), and I can usually mitigate it by canceling and then restarting generation until TG performance goes back to the expected value, the chart below shows this behavior captured in a benchmark.
|
||||
|
||||

|
||||
|
||||
Also I'm fairly certain I've never encountered this bug in batched-bench only in server and sweep-bench both of which manipulate the KV more than batched-bench.
|
||||
|
||||
### Name and Version
|
||||
|
||||
Graph capturing this behavior was on https://github.com/ikawrakow/ik_llama.cpp/commit/3d6e25c82db5510df483185b8a20f0ce01136dd7
|
||||
|
||||
### What operating system are you seeing the problem on?
|
||||
|
||||
Linux
|
||||
|
||||
### Relevant log output
|
||||
|
||||
```shell
|
||||
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### 💬 Conversation
|
||||
|
||||
👤 **saood06** commented the **2025-03-23** at **13:11:13**:<br>
|
||||
|
||||
Closing via #282
|
||||
|
||||

|
||||
|
||||
PP performance for those options:
|
||||
|
||||

|
||||
|
||||
For my primary use case MLA-3 on is the best with nice PP and TG, it seems like though for tasks with very small PP and TG keeping context under 8K MLA-1 off is useful.
|
||||
|
||||
Thank you for the quick find and fix.
|
||||
Reference in New Issue
Block a user