Commit Graph

3736 Commits

Author SHA1 Message Date
Iwan Kawrakow
7090f171e1 Fix iq4_k_r4 on NEON 2025-05-19 19:46:44 +03:00
Iwan Kawrakow
06efa17fa9 Adding forgotten iq5_k_r4 2025-05-19 18:00:16 +03:00
Nexes the Elder
380ab3f33a Forgotten MMQ ref and typo (#431) 2025-05-19 17:18:03 +03:00
Iwan Kawrakow
65c8e860bf Refactor iqk: FA refactored (NEON) 2025-05-19 17:16:00 +03:00
Iwan Kawrakow
9ae8f75114 Fix bf16 2025-05-19 15:30:46 +03:00
Iwan Kawrakow
9541631a52 Most helpers don't need to be templates
Also hide Q4_0 and Q8_KV behind IQK_FA_ALL_QUANTS.

Compilation time drops to 14 second on the Ryzen-5975WX
2025-05-19 15:20:43 +03:00
Iwan Kawrakow
fbfe79e2fe Adding forgotten file 2025-05-19 13:42:20 +03:00
Iwan Kawrakow
630279cb54 Refactor iqk: FA refactored (Zen4)
Compile time for the FA files is now ~21 seconds on my
Ryzen-7950X, so still slightly too long for my taste
but much better than the 142 seconds we had before.
2025-05-19 13:38:38 +03:00
Iwan Kawrakow
131e5ac6df Refactor iqk: FA compiles
If it works is a different story.
Current compile time: 107.3 sesonds on the Ryzen-7950X
2025-05-19 11:43:02 +03:00
Iwan Kawrakow
4b4b4fdcac Refactor iqk: GEMM kernels are refactored on NEON 2025-05-19 08:36:16 +03:00
Iwan Kawrakow
7aa2de6d5a Refactor iqk: factor out repacked iqk quants (NEON) 2025-05-19 08:25:56 +03:00
Iwan Kawrakow
7e59d2b974 Refactor iqk: factor out repacked k-quants (NEON) 2025-05-19 08:11:24 +03:00
Iwan Kawrakow
2b8a231d87 Refactor iqk: factor out repacked legacy quants (NEON) 2025-05-19 07:51:28 +03:00
Iwan Kawrakow
bd1e4d4909 Refactor iqk: factor out legacy quants (NEON) 2025-05-18 19:47:53 +03:00
Iwan Kawrakow
465d717bb9 Refactor iqk: factor out iqk quants (NEON) 2025-05-18 19:06:46 +03:00
Iwan Kawrakow
312413694f Also iq4_xs belongs to k-quants 2025-05-18 18:14:45 +03:00
Iwan Kawrakow
f4ab917e9e Refactor iqk: factor out floats (NEON) 2025-05-18 18:09:39 +03:00
Iwan Kawrakow
c805a19202 Refactor iqk: factor out k-quants (NEON) 2025-05-18 17:41:54 +03:00
Iwan Kawrakow
28b94800c1 Refactor iqk: factor out 1-bit quants (NEON) 2025-05-18 16:54:44 +03:00
Iwan Kawrakow
c63a0af5b7 Refactor iqk: GEMM kernels are refactored on AVX2/AVX512 2025-05-18 15:50:20 +03:00
Iwan Kawrakow
0d96f3bd37 Refactor iqk: Factor out GEMM for repacked i-quants 2025-05-18 14:51:59 +03:00
Iwan Kawrakow
f501200d42 Refactor iqk: Factor out GEMM for q8_K_R8, q8_KV 2025-05-18 14:02:07 +03:00
Iwan Kawrakow
6cd3609a85 Refactor iqk: Factor out GEMM for repacked legacy quants 2025-05-18 10:20:54 +03:00
Iwan Kawrakow
7868545062 Refactor iqk: Factor out GEMM for iq1_bn, iq2_bn, iq2_bn_r4 2025-05-17 19:53:48 +03:00
Iwan Kawrakow
d66ec60836 Refactor iqk: fix AVX2 2025-05-17 19:29:55 +03:00
Iwan Kawrakow
9b6e75cb79 Refactor iqk: Factor out GEMM for 1-bit quants (ABX2/AVX512) 2025-05-17 18:28:24 +03:00
Iwan Kawrakow
082a9bd632 Refactor iqk: fix AVX2 2025-05-17 17:45:32 +03:00
Iwan Kawrakow
de5660cee3 Refactor iqk: Factor out GEMM for iqk-quants (AVX2/AVX512) 2025-05-17 17:34:34 +03:00
Iwan Kawrakow
8dae13cd84 Refactor iqk: fix AVX2 2025-05-17 16:43:53 +03:00
Iwan Kawrakow
2cbbc5581f Refactor iqk: Factor out GEMM for i-quants (AVX2/AVX512) 2025-05-17 16:34:25 +03:00
Iwan Kawrakow
d355ff997b Refactor iqk: fix AVX2 2025-05-17 15:45:15 +03:00
Iwan Kawrakow
4ef94c26fb Refactor iqk: Factor out GEMM for k-quants (AVX2/AVX512) 2025-05-17 15:34:56 +03:00
Iwan Kawrakow
f83e64dcb6 Refactor iqk: Factor out GEMM for legacy quants (AVX2/AVX512) 2025-05-17 14:32:00 +03:00
Iwan Kawrakow
51a87cf20d Refactor iqk: Factor out float GEMM (AVX2/AVX512) 2025-05-17 13:41:39 +03:00
Iwan Kawrakow
68b782e861 Refactor iqk: WIP 2025-05-17 12:31:39 +03:00
Kawrakow
b3036a872f Option to enable disable the IQK CPU FA kernels (#429)
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-05-17 11:21:58 +03:00
Kawrakow
c35a383bcd Zen4: Faster PP for IQ2_KS, IQ4_KS, IQ5_KS (#428)
* Zen4: faster PP for iq4_ks and iq5_ks

* Zen4: faster PP for iq2_ks

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-05-17 10:42:33 +03:00
Kawrakow
7abdf2b099 IQ5_KS_R4: row-interleaved IQ5_KS (#426)
* iq5_ks_r4: basics

* iq5_ks_r4: Zen4 works

* iq5_ks_r4: AVX2 works

* iq5_ks_r4: NEON

* Fix iq5_ks on NEON

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-05-17 08:57:26 +03:00
Kawrakow
134d548173 Fix AVX2 implementation of IQ4_K, IQ4_KS, IQ5_K, IQ6_K (#427)
* Fix IQ4_K on AVX2

* Fix IQ4_KS on AVX2

* Fix IQ5_K on AVX2

* Fix IQ6_K on AVX2

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-05-16 17:25:15 +03:00
Kawrakow
34ae71c4d7 Adding forgotten template instance for iq5_ks (#424)
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-05-15 16:50:15 +03:00
Kawrakow
3d92d7f802 Adding IQ5_KS - 5.25 bpw quants (#422)
* iq5_ks: basics

* iq5_ks: quantize

* iq5_ks: CUDA dequantize works

* iq5_ks: dot product works on CUDA

* iq5_ks: MMQ works

* iq5_ks: Zen4

* iq5_ks: AVX2

But is is not quite right, just like iq4_k, iq5_k, iq6_k, iq4_ks.
All these need fixing on AVX2.

* iq5_ks: NEON

* iq5_ks: Metal dequantize

* iq5_ks: Metal dot product

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-05-15 16:02:39 +03:00
Kawrakow
3f8c865b92 Fix standard attention on the CPU (#421)
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-05-15 08:43:39 +03:00
Kawrakow
14ed9fb44d CUDA: quantized GEMM for for IQ2_KS, IQ2_K, IQ3_K (#418)
* MMQ for iq2_k

* This works

* MMQ for iq3_k

* MMQ for iq2_ks

* Fix iq2_ks

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-05-15 08:15:08 +03:00
Kawrakow
0435b68e6d CUDA: quantized GEMM for for IQ4_K, IQ5_K, IQ6_K (#417)
* MMQ for iq4_k: WIP (not working)

* MMQ for iq4_k: working now

* MMQ for iq5_k

* Cleanup

* MMQ for iq5_k: slightly faster

* MMQ for iq6_k

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-05-14 14:04:11 +03:00
Kawrakow
b90d6ede2e Fix SER (CUDA) (#416)
* Fixing SER bugs

* Cleanup

* This seems to fix it.

* This seems to work

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-05-14 07:29:28 +03:00
Kawrakow
13740622e9 Fix SER (CPU) (#415)
* Fixing SER bugs

* Cleanup

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-05-13 17:55:04 +03:00
Kawrakow
0c57f84dc4 Fix imatrix calculation for MLA models (#411)
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-05-13 17:53:38 +03:00
Kawrakow
553c08b6b4 Better CPU FA performance for DeepSeek-Lite (#410)
* Better CPU FA performance for DeepSeek-Lite

* It must be like this

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-05-13 17:53:20 +03:00
Kawrakow
4ba6bbb44a Update README.md 2025-05-12 15:48:37 +03:00
Kawrakow
627f406437 Fix new CUDA FA on Touring (#413)
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-05-12 15:09:33 +03:00