Files
ik_llama.cpp/ggml/src
Kawrakow fdfbd98022 Faster IQ1_BN Metal implementation (#107)
* iq1_bn: faster Metal dot product

82 t/s -> 87.9 t/s

* iq1_bn(Metal): 87.9 -> 89.0 t/s for TG-128

* iq1_bn(Metal): 89.0 -> 94.7 t/s for TG-128

So, total improvement is ~15%. Not bad.

* iq1_bn(Metal): 686 -> 702 t/s for PP-512

* iq2_bn(Metal): 710 -> 714 t/s for PP-512

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-10-26 10:59:59 +02:00
..
2024-10-25 13:08:43 +02:00
2024-10-25 13:08:43 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-10-25 13:08:43 +02:00
2024-10-25 13:08:43 +02:00
2024-10-25 13:08:43 +02:00
2024-08-12 15:14:32 +02:00
2024-10-25 13:08:43 +02:00
2024-10-25 13:08:43 +02:00
2024-10-25 13:08:43 +02:00