Files
ik_llama.cpp/ggml-cuda
Kawrakow 9918542658 bitnet: remove iq1_bn lookup table storing +/- signs
The AVX2 implementation was the only one left using it, so
I decided to see if we can get a performant implementation
using the 0,1,2 lookup table. Turns out we can, and it is
even slightly faster than the sign based table. We now
get PP-512 = 275 t/s and TG-128 = 57.7 t/s with 16 threads
on the Ryzen-7950X.

With only one lookup table left for iq1_bn, I renamed it to
iq1bn_grid_u16.
2024-06-25 18:19:11 +03:00
..
2024-06-05 16:53:00 +02:00
2024-03-29 17:45:46 +02:00
2024-04-30 12:16:08 +03:00
2024-06-22 12:02:52 +03:00
2024-06-05 11:29:20 +03:00
2024-06-17 00:23:04 +02:00
2024-06-17 00:23:04 +02:00