ik_llama.cpp/github-data/pull_requests/331-Better gemm_gemv on AVX2 fr q4_0_r8.md at 3600d82e986ab91ec8996a7ebf15168da2fec34e

ikawrakow/ik_llama.cpp

Fork 0

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-30 11:21:56 +00:00

Files

Thomas 94aa54df76 Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

616 B

Raw Blame History

🔀 #331 - Better gemm/gemv on AVX2 fr q4_0_r8

Author	`ikawrakow`
State	❌ Closed
Created	2025-04-15
Updated	2025-04-15

Description

I constantly get confused how many int16_t dot products (_mm256_maddubs_epi16() results) I can sum up as int16_t before overflowing. In the case of Q4_0 I was adding too few, and was having one unnecessary _mm256_madd_epi16 because of that. This PR fixes this. The result is a ~10% gain in performance when tested with Geema-3-12B-Instruct.

616 B Raw Blame History

🔀 #331 - Better gemm/gemv on AVX2 fr q4_0_r8

Description

616 B

Raw Blame History