Add GitHub data: filename sanitization (#640)

2026-05-04 13:21:37 +00:00 · 2025-07-23 13:31:53 +02:00
parent 3600d82e98
commit eaa2510a28
626 changed files with 0 additions and 0 deletions
--- a/github-data/pull_requests/589
+++ b/github-data/pull_requests/589
@@ -0,0 +1,35 @@
+### 🔀 [#589](https://github.com/ikawrakow/ik_llama.cpp/pull/589) - CUDA: small PP performance improvement for MoE models
+
+| **Author** | `ikawrakow` |
+| :--- | :--- |
+| **State** | ❌ **Closed** |
+| **Created** | 2025-07-06 |
+| **Updated** | 2025-07-07 |
+
+---
+
+#### Description
+
+This PR brings a small (2-3%) prompt processing performance improvement on CUDA for quantized MoE models (when  `-fmoe` is used).
+
+Instead of first copying activations to contiguous memory and the quantizing, quantization is done directly using the row mapping IDs, thus saving the associated kernel launch overhead. 
+
+Here is a performance comparison for `Q4_0` quantized DeepSeek-Lite on RTX-4080 using `-mla 3 -fa -fmoe -b 4096 -ub 4096`
+
+### Main branch
+
+|    PP |     TG |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |
+|-------|--------|--------|----------|----------|----------|----------|
+|  4096 |   1024 |      0 |    0.480 |  8532.52 |    5.640 |   181.55 |
+|  4096 |   1024 |   4096 |    0.566 |  7240.62 |    5.904 |   173.43 |
+|  4096 |   1024 |   8192 |    0.674 |  6073.99 |    6.143 |   166.68 |
+|  4096 |   1024 |  12288 |    0.789 |  5189.61 |    6.421 |   159.47 |
+
+### PR
+
+|    PP |     TG |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |
+|-------|--------|--------|----------|----------|----------|----------|
+|  4096 |   1024 |      0 |    0.469 |  8738.41 |    5.638 |   181.61 |
+|  4096 |   1024 |   4096 |    0.554 |  7388.85 |    5.909 |   173.29 |
+|  4096 |   1024 |   8192 |    0.670 |  6117.30 |    6.148 |   166.57 |
+|  4096 |   1024 |  12288 |    0.779 |  5256.86 |    6.435 |   159.14 |