ik_llama.cpp/github-data/pull_requests/518 - IQ3_S_ much faster CPU prompt processing.md

### 🔀 [#518](https://github.com/ikawrakow/ik_llama.cpp/pull/518) - IQ3_S: much faster CPU prompt processing

| **Author** | `ikawrakow` |
| :--- | :--- |
| **State** | ❌ **Closed** |
| **Created** | 2025-06-11 |
| **Updated** | 2025-06-12 |

---

#### Description

As PRs #515, #516, #517.

Here a sweep-bench with this PR for LlaMA-3.1-8B on a Ryzen-7950X CPU

|    PP |     TG |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |
|-------|--------|--------|----------|----------|----------|----------|
|   512 |    128 |      0 |    1.733 |   295.36 |    8.239 |    15.54 |
|   512 |    128 |    512 |    1.805 |   283.62 |    8.398 |    15.24 |
|   512 |    128 |   1024 |    1.857 |   275.73 |    8.561 |    14.95 |
|   512 |    128 |   1536 |    1.905 |   268.74 |    8.430 |    15.18 |
|   512 |    128 |   2048 |    1.954 |   261.97 |    8.563 |    14.95 |

I haven't done this for a while, but I think for this one worth looking at mainline `llama.cpp` (build: `5635 (3069e3169)`)

|    PP |     TG |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |
|-------|--------|--------|----------|----------|----------|----------|
|   512 |    128 |      0 |   18.261 |    28.04 |    7.933 |    16.14 |
|   512 |    128 |    512 |   18.708 |    27.37 |    8.335 |    15.36 |
|   512 |    128 |   1024 |   19.048 |    26.88 |    8.547 |    14.98 |
|   512 |    128 |   1536 |   19.480 |    26.28 |    8.739 |    14.65 |
|   512 |    128 |   2048 |   19.670 |    26.03 |    8.912 |    14.36 |

10X faster PP here!