ik_llama.cpp/github-data/pull_requests/610 - q8_k_r8_ experimental AVX512 version.md at 608ff761703613fafe1259cd564ce34f240699d3 - ik_llama.cpp

ikawrakow/ik_llama.cpp

Fork 0

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-30 11:21:56 +00:00

Files

Thomas 0451f10a42 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

716 B

Raw Blame History

🔀 #610 - q8_k_r8: experimental AVX512 version

Author	`ikawrakow`
State	✅ Open
Created	2025-07-14
Updated	2025-07-18

Description

@ubergarm This is specifically for your 9950X CPU.

On my 7950X this is ~10% slower than what we have on the main branch. The 7950X supports AVX512, but 512-bit instructions get executed as two 256-bit instructions. Hence, I'm expecting (hoping?) this Q8_K_R8 GEMM version to be significantly faster on a CPU with "real" 512-bit instructions such as the 9950X.

Please benchmark it so I can decide if it is worth adding this to the main branch.

716 B Raw Blame History

🔀 #610 - q8_k_r8: experimental AVX512 version

Description

716 B

Raw Blame History