Add GitHub data (#637)

2026-04-24 00:19:19 +00:00 · 2025-07-22 18:18:40 +02:00
parent 9513222ba5
commit 94aa54df76
626 changed files with 175142 additions and 0 deletions
--- a/github-data/pull_requests/4-Simdify
+++ b/github-data/pull_requests/4-Simdify
@@ -0,0 +1,22 @@
+### 🔀 [#4](https://github.com/ikawrakow/ik_llama.cpp/pull/4) - Simdify and multi-thread tanh
+
+| **Author** | `ikawrakow` |
+| :--- | :--- |
+| **State** | ❌ **Closed** |
+| **Created** | 2024-07-27 |
+| **Updated** | 2024-07-27 |
+
+---
+
+#### Description
+
+It seemed Gemma-2 performance is lower than expected for its size. Looking at the architecture, I noticed that `tanh` is used in each layer, and then at the end for soft-caping the final output. `ggml` had `tanh` set to be computed with a single thread. Combined with `tanh(x)` being a pretty expensive operation, this resulted in a significant fraction of the time being spent in the `tanh` operation.
+
+After multi-threading `ggml_vec_soft_max_f32` and simd-ifying the `tanh` computation, I observe a 33% gain in prompt processing speed for Gemma-2-9b (!!!) TG is of course memory bound, but despite this, we still get a ~2% boost at 4 threads (which gives max TG performance on my Ryzen-7950X).
+
+Simd-ifying:
+We have
+```
+   tanh(x) = (exp(2*x) - 1)/(exp(2*x) + 1)
+```
+so we can just use Justine Tunney's SIMD implementation for the exponential function.