Files
ik_llama.cpp/github-data/pull_requests/610 - q8_k_r8_ experimental AVX512 version.md
2025-07-23 13:31:53 +02:00

716 B

🔀 #610 - q8_k_r8: experimental AVX512 version

Author ikawrakow
State Open
Created 2025-07-14
Updated 2025-07-18

Description

@ubergarm This is specifically for your 9950X CPU.

On my 7950X this is ~10% slower than what we have on the main branch. The 7950X supports AVX512, but 512-bit instructions get executed as two 256-bit instructions. Hence, I'm expecting (hoping?) this Q8_K_R8 GEMM version to be significantly faster on a CPU with "real" 512-bit instructions such as the 9950X.

Please benchmark it so I can decide if it is worth adding this to the main branch.