ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-26 09:29:27 +00:00

Files

Kawrakow 8c7e1b72d7 Much faster prompt processing for I-quants (ARM_NEON) (#550 )

* iq2_xxs

55.8 -> 167.5 t/s. iq2_xxs is at 93.7 t/s

* iq2_xs

46.4 -> 166.6 t/s. iq2_xs_r4 is at 72.3 t/s.

* iq2_s

42.8 t/s -> 166.8 t/s. iq2_s_r4 is at 71.1 t/s.

* iq3_xxs

51.8 t/s -> 165.6 t/s. iq3_xxs_r4 is at 84.6 t/s.

* iq3_s

46.0 t/s -> 162.0 t/s. iq3_s_r4 is at 79.4 t/s

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2025-06-23 15:50:24 +02:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

Fix non rpc build error (#506 )

2025-06-08 17:27:00 +03:00

src

Much faster prompt processing for I-quants (ARM_NEON) (#550 )

2025-06-23 15:50:24 +02:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Better strategy for GPU offload (#520 )

2025-06-12 19:25:11 +03:00