Files
ik_llama.cpp/github-data/pull_requests/518 - IQ3_S_ much faster CPU prompt processing.md
2025-07-23 13:31:53 +02:00

35 lines
1.5 KiB
Markdown

### 🔀 [#518](https://github.com/ikawrakow/ik_llama.cpp/pull/518) - IQ3_S: much faster CPU prompt processing
| **Author** | `ikawrakow` |
| :--- | :--- |
| **State** | ❌ **Closed** |
| **Created** | 2025-06-11 |
| **Updated** | 2025-06-12 |
---
#### Description
As PRs #515, #516, #517.
Here a sweep-bench with this PR for LlaMA-3.1-8B on a Ryzen-7950X CPU
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|-------|--------|--------|----------|----------|----------|----------|
| 512 | 128 | 0 | 1.733 | 295.36 | 8.239 | 15.54 |
| 512 | 128 | 512 | 1.805 | 283.62 | 8.398 | 15.24 |
| 512 | 128 | 1024 | 1.857 | 275.73 | 8.561 | 14.95 |
| 512 | 128 | 1536 | 1.905 | 268.74 | 8.430 | 15.18 |
| 512 | 128 | 2048 | 1.954 | 261.97 | 8.563 | 14.95 |
I haven't done this for a while, but I think for this one worth looking at mainline `llama.cpp` (build: `5635 (3069e3169)`)
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|-------|--------|--------|----------|----------|----------|----------|
| 512 | 128 | 0 | 18.261 | 28.04 | 7.933 | 16.14 |
| 512 | 128 | 512 | 18.708 | 27.37 | 8.335 | 15.36 |
| 512 | 128 | 1024 | 19.048 | 26.88 | 8.547 | 14.98 |
| 512 | 128 | 1536 | 19.480 | 26.28 | 8.739 | 14.65 |
| 512 | 128 | 2048 | 19.670 | 26.03 | 8.912 | 14.36 |
10X faster PP here!