Files
ik_llama.cpp/ggml
Kawrakow b94cd3b632 Refactor iqk_mul_mat.cpp (#435)
* Refactor iqk: WIP

* Refactor iqk: Factor out float GEMM (AVX2/AVX512)

* Refactor iqk: Factor out GEMM for legacy quants (AVX2/AVX512)

* Refactor iqk: Factor out GEMM for k-quants (AVX2/AVX512)

* Refactor iqk: fix AVX2

* Refactor iqk: Factor out GEMM for i-quants (AVX2/AVX512)

* Refactor iqk: fix AVX2

* Refactor iqk: Factor out GEMM for iqk-quants (AVX2/AVX512)

* Refactor iqk: fix AVX2

* Refactor iqk: Factor out GEMM for 1-bit quants (ABX2/AVX512)

* Refactor iqk: fix AVX2

* Refactor iqk: Factor out GEMM for iq1_bn, iq2_bn, iq2_bn_r4

* Refactor iqk: Factor out GEMM for repacked legacy quants

* Refactor iqk: Factor out GEMM for q8_K_R8, q8_KV

* Refactor iqk: Factor out GEMM for repacked i-quants

* Refactor iqk: GEMM kernels are refactored on AVX2/AVX512

* Refactor iqk: factor out 1-bit quants (NEON)

* Refactor iqk: factor out k-quants (NEON)

* Refactor iqk: factor out floats (NEON)

* Also iq4_xs belongs to k-quants

* Refactor iqk: factor out iqk quants (NEON)

* Refactor iqk: factor out legacy quants (NEON)

* Refactor iqk: factor out repacked legacy quants (NEON)

* Refactor iqk: factor out repacked k-quants (NEON)

* Refactor iqk: factor out repacked iqk quants (NEON)

* Refactor iqk: GEMM kernels are refactored on NEON

* Refactor iqk: FA compiles

If it works is a different story.
Current compile time: 107.3 sesonds on the Ryzen-7950X

* Refactor iqk: FA refactored (Zen4)

Compile time for the FA files is now ~21 seconds on my
Ryzen-7950X, so still slightly too long for my taste
but much better than the 142 seconds we had before.

* Adding forgotten file

* Most helpers don't need to be templates

Also hide Q4_0 and Q8_KV behind IQK_FA_ALL_QUANTS.

Compilation time drops to 14 second on the Ryzen-5975WX

* Fix bf16

* Refactor iqk: FA refactored (NEON)

* Forgotten MMQ ref and typo (#431)

* Adding forgotten iq5_k_r4

* Fix iq4_k_r4 on NEON

* Fix iq4_ks on NEON

It was broken before the refactoring (the shifts were not correctly
applied).

* Fix q8_0 on NEON

* Fix q6_0 K cache

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Co-authored-by: Nexes the Elder <124105151+Nexesenex@users.noreply.github.com>
2025-05-22 10:05:51 +03:00
..
2024-07-27 07:55:01 +02:00
2025-05-20 17:03:14 +03:00
2025-05-22 10:05:51 +03:00
2024-07-27 07:55:01 +02:00