Iwan Kawrakow
7090f171e1
Fix iq4_k_r4 on NEON
2025-05-19 19:46:44 +03:00
Iwan Kawrakow
06efa17fa9
Adding forgotten iq5_k_r4
2025-05-19 18:00:16 +03:00
Nexes the Elder
380ab3f33a
Forgotten MMQ ref and typo ( #431 )
2025-05-19 17:18:03 +03:00
Iwan Kawrakow
65c8e860bf
Refactor iqk: FA refactored (NEON)
2025-05-19 17:16:00 +03:00
Iwan Kawrakow
9ae8f75114
Fix bf16
2025-05-19 15:30:46 +03:00
Iwan Kawrakow
9541631a52
Most helpers don't need to be templates
...
Also hide Q4_0 and Q8_KV behind IQK_FA_ALL_QUANTS.
Compilation time drops to 14 second on the Ryzen-5975WX
2025-05-19 15:20:43 +03:00
Iwan Kawrakow
fbfe79e2fe
Adding forgotten file
2025-05-19 13:42:20 +03:00
Iwan Kawrakow
630279cb54
Refactor iqk: FA refactored (Zen4)
...
Compile time for the FA files is now ~21 seconds on my
Ryzen-7950X, so still slightly too long for my taste
but much better than the 142 seconds we had before.
2025-05-19 13:38:38 +03:00
Iwan Kawrakow
131e5ac6df
Refactor iqk: FA compiles
...
If it works is a different story.
Current compile time: 107.3 sesonds on the Ryzen-7950X
2025-05-19 11:43:02 +03:00
Iwan Kawrakow
4b4b4fdcac
Refactor iqk: GEMM kernels are refactored on NEON
2025-05-19 08:36:16 +03:00
Iwan Kawrakow
7aa2de6d5a
Refactor iqk: factor out repacked iqk quants (NEON)
2025-05-19 08:25:56 +03:00
Iwan Kawrakow
7e59d2b974
Refactor iqk: factor out repacked k-quants (NEON)
2025-05-19 08:11:24 +03:00
Iwan Kawrakow
2b8a231d87
Refactor iqk: factor out repacked legacy quants (NEON)
2025-05-19 07:51:28 +03:00
Iwan Kawrakow
bd1e4d4909
Refactor iqk: factor out legacy quants (NEON)
2025-05-18 19:47:53 +03:00
Iwan Kawrakow
465d717bb9
Refactor iqk: factor out iqk quants (NEON)
2025-05-18 19:06:46 +03:00
Iwan Kawrakow
312413694f
Also iq4_xs belongs to k-quants
2025-05-18 18:14:45 +03:00
Iwan Kawrakow
f4ab917e9e
Refactor iqk: factor out floats (NEON)
2025-05-18 18:09:39 +03:00
Iwan Kawrakow
c805a19202
Refactor iqk: factor out k-quants (NEON)
2025-05-18 17:41:54 +03:00
Iwan Kawrakow
28b94800c1
Refactor iqk: factor out 1-bit quants (NEON)
2025-05-18 16:54:44 +03:00
Iwan Kawrakow
c63a0af5b7
Refactor iqk: GEMM kernels are refactored on AVX2/AVX512
2025-05-18 15:50:20 +03:00
Iwan Kawrakow
0d96f3bd37
Refactor iqk: Factor out GEMM for repacked i-quants
2025-05-18 14:51:59 +03:00
Iwan Kawrakow
f501200d42
Refactor iqk: Factor out GEMM for q8_K_R8, q8_KV
2025-05-18 14:02:07 +03:00
Iwan Kawrakow
6cd3609a85
Refactor iqk: Factor out GEMM for repacked legacy quants
2025-05-18 10:20:54 +03:00
Iwan Kawrakow
7868545062
Refactor iqk: Factor out GEMM for iq1_bn, iq2_bn, iq2_bn_r4
2025-05-17 19:53:48 +03:00
Iwan Kawrakow
d66ec60836
Refactor iqk: fix AVX2
2025-05-17 19:29:55 +03:00
Iwan Kawrakow
9b6e75cb79
Refactor iqk: Factor out GEMM for 1-bit quants (ABX2/AVX512)
2025-05-17 18:28:24 +03:00
Iwan Kawrakow
082a9bd632
Refactor iqk: fix AVX2
2025-05-17 17:45:32 +03:00
Iwan Kawrakow
de5660cee3
Refactor iqk: Factor out GEMM for iqk-quants (AVX2/AVX512)
2025-05-17 17:34:34 +03:00
Iwan Kawrakow
8dae13cd84
Refactor iqk: fix AVX2
2025-05-17 16:43:53 +03:00
Iwan Kawrakow
2cbbc5581f
Refactor iqk: Factor out GEMM for i-quants (AVX2/AVX512)
2025-05-17 16:34:25 +03:00
Iwan Kawrakow
d355ff997b
Refactor iqk: fix AVX2
2025-05-17 15:45:15 +03:00
Iwan Kawrakow
4ef94c26fb
Refactor iqk: Factor out GEMM for k-quants (AVX2/AVX512)
2025-05-17 15:34:56 +03:00
Iwan Kawrakow
f83e64dcb6
Refactor iqk: Factor out GEMM for legacy quants (AVX2/AVX512)
2025-05-17 14:32:00 +03:00
Iwan Kawrakow
51a87cf20d
Refactor iqk: Factor out float GEMM (AVX2/AVX512)
2025-05-17 13:41:39 +03:00
Iwan Kawrakow
68b782e861
Refactor iqk: WIP
2025-05-17 12:31:39 +03:00
Kawrakow
b3036a872f
Option to enable disable the IQK CPU FA kernels ( #429 )
...
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-05-17 11:21:58 +03:00
Kawrakow
c35a383bcd
Zen4: Faster PP for IQ2_KS, IQ4_KS, IQ5_KS ( #428 )
...
* Zen4: faster PP for iq4_ks and iq5_ks
* Zen4: faster PP for iq2_ks
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-05-17 10:42:33 +03:00
Kawrakow
7abdf2b099
IQ5_KS_R4: row-interleaved IQ5_KS ( #426 )
...
* iq5_ks_r4: basics
* iq5_ks_r4: Zen4 works
* iq5_ks_r4: AVX2 works
* iq5_ks_r4: NEON
* Fix iq5_ks on NEON
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-05-17 08:57:26 +03:00
Kawrakow
134d548173
Fix AVX2 implementation of IQ4_K, IQ4_KS, IQ5_K, IQ6_K ( #427 )
...
* Fix IQ4_K on AVX2
* Fix IQ4_KS on AVX2
* Fix IQ5_K on AVX2
* Fix IQ6_K on AVX2
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-05-16 17:25:15 +03:00
Kawrakow
34ae71c4d7
Adding forgotten template instance for iq5_ks ( #424 )
...
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-05-15 16:50:15 +03:00
Kawrakow
3d92d7f802
Adding IQ5_KS - 5.25 bpw quants ( #422 )
...
* iq5_ks: basics
* iq5_ks: quantize
* iq5_ks: CUDA dequantize works
* iq5_ks: dot product works on CUDA
* iq5_ks: MMQ works
* iq5_ks: Zen4
* iq5_ks: AVX2
But is is not quite right, just like iq4_k, iq5_k, iq6_k, iq4_ks.
All these need fixing on AVX2.
* iq5_ks: NEON
* iq5_ks: Metal dequantize
* iq5_ks: Metal dot product
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-05-15 16:02:39 +03:00
Kawrakow
3f8c865b92
Fix standard attention on the CPU ( #421 )
...
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-05-15 08:43:39 +03:00
Kawrakow
14ed9fb44d
CUDA: quantized GEMM for for IQ2_KS, IQ2_K, IQ3_K ( #418 )
...
* MMQ for iq2_k
* This works
* MMQ for iq3_k
* MMQ for iq2_ks
* Fix iq2_ks
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-05-15 08:15:08 +03:00
Kawrakow
0435b68e6d
CUDA: quantized GEMM for for IQ4_K, IQ5_K, IQ6_K ( #417 )
...
* MMQ for iq4_k: WIP (not working)
* MMQ for iq4_k: working now
* MMQ for iq5_k
* Cleanup
* MMQ for iq5_k: slightly faster
* MMQ for iq6_k
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-05-14 14:04:11 +03:00
Kawrakow
b90d6ede2e
Fix SER (CUDA) ( #416 )
...
* Fixing SER bugs
* Cleanup
* This seems to fix it.
* This seems to work
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-05-14 07:29:28 +03:00
Kawrakow
13740622e9
Fix SER (CPU) ( #415 )
...
* Fixing SER bugs
* Cleanup
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-05-13 17:55:04 +03:00
Kawrakow
0c57f84dc4
Fix imatrix calculation for MLA models ( #411 )
...
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-05-13 17:53:38 +03:00
Kawrakow
553c08b6b4
Better CPU FA performance for DeepSeek-Lite ( #410 )
...
* Better CPU FA performance for DeepSeek-Lite
* It must be like this
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-05-13 17:53:20 +03:00
Kawrakow
4ba6bbb44a
Update README.md
2025-05-12 15:48:37 +03:00
Kawrakow
627f406437
Fix new CUDA FA on Touring ( #413 )
...
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-05-12 15:09:33 +03:00