ik_llama.cpp/iqk-quantize.cpp at 71725a918f9edee559a978397779486dce7c703a

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-03 10:30:27 +00:00

Files

Iwan Kawrakow 753dbaeeb0 bitnet: remove iq1_bn lookup table storing +/- signs

The AVX2 implementation was the only one left using it, so
I decided to see if we can get a performant implementation
using the 0,1,2 lookup table. Turns out we can, and it is
even slightly faster than the sign based table. We now
get PP-512 = 275 t/s and TG-128 = 57.7 t/s with 16 threads
on the Ryzen-7950X.

With only one lookup table left for iq1_bn, I renamed it to
iq1bn_grid_u16.

2024-06-25 18:19:11 +03:00

14 KiB

Raw Blame History

View Raw

14 KiB Raw Blame History

14 KiB

Raw Blame History