Commit Graph

  • a09de6eaef iq4_ks: faster dot product on Metal (#90) Kawrakow 2024-10-16 14:13:03 +03:00
  • 993ca95e9e iq4_ks: faster dot product on Metal (#90) Kawrakow 2024-10-16 14:13:03 +03:00
  • 3e0c2519d3 iq4_ks: faster dot product on Metal ik/metal_faster_iq4ks Iwan Kawrakow 2024-10-16 14:04:59 +03:00
  • 1882040c70 Minor iq3_k tweak Kawrakow 2024-10-14 18:13:11 +03:00
  • ff23008ed4 Minor iq3_k tweak Iwan Kawrakow 2024-10-14 18:13:11 +03:00
  • 250c325e7e iq3_k: fix and optimize Metal dot product (#87) Kawrakow 2024-10-14 10:46:41 +03:00
  • 302a6225a1 iq3_k: fix and optimize Metal dot product (#87) Kawrakow 2024-10-14 10:46:41 +03:00
  • 55f91a98f1 iq3_k: slightly faster Metal dot product ik/metal_fix_iq3k Iwan Kawrakow 2024-10-14 10:41:26 +03:00
  • 9802c771b8 iq3_k: fix Metal dot product Iwan Kawrakow 2024-10-14 10:15:25 +03:00
  • f61bd33a04 Fix and optimize iq2k Metal implementation (#86) Kawrakow 2024-10-13 14:30:30 +03:00
  • baab1d9a1e Fix and optimize iq2k Metal implementation (#86) Kawrakow 2024-10-13 14:30:30 +03:00
  • f74905d649 iq2_k: optimize Metal dot product ik/metal_fix_iq2k Iwan Kawrakow 2024-10-13 14:09:53 +03:00
  • 389250dbc2 I somehow broke iq2_k on Metal? - fix dot product Iwan Kawrakow 2024-10-13 13:52:26 +03:00
  • 332049da85 I somehow broke iq2_k on Metal? - fix dequantize Iwan Kawrakow 2024-10-13 13:47:46 +03:00
  • 67817fb5b9 IQ2_KS: 2.1875 bpw non-linear quantization (#85) Kawrakow 2024-10-13 13:34:30 +03:00
  • 910a134094 IQ2_KS: 2.1875 bpw non-linear quantization (#85) Kawrakow 2024-10-13 13:34:30 +03:00
  • f9f15c27b6 iq2_ks: faster Metal ik/iq2k_experiments Iwan Kawrakow 2024-10-13 12:23:14 +03:00
  • 5cafaf5481 iq2_ks: Metal Iwan Kawrakow 2024-10-13 11:04:03 +03:00
  • 550c40e27f iq2_ks: ARM_NEON Iwan Kawrakow 2024-10-13 09:47:11 +03:00
  • 18cdf624f8 iq2_ks: scalar dot product Iwan Kawrakow 2024-10-13 08:48:13 +03:00
  • 1f6e498dfa iq2_ks: AVX2 Iwan Kawrakow 2024-10-13 08:35:02 +03:00
  • c98243b10d iq2_ks: Zen4 Iwan Kawrakow 2024-10-12 18:14:00 +03:00
  • 15a8115fcf iq2_ks: WIP Iwan Kawrakow 2024-10-12 17:54:32 +03:00
  • 70e7b758f5 iq2_ks: WIP Iwan Kawrakow 2024-10-12 16:28:46 +03:00
  • aa36d90684 iq2_ks: CUDA works Iwan Kawrakow 2024-10-12 12:42:41 +03:00
  • 103c8c053a iq2ks: basics Iwan Kawrakow 2024-10-12 12:22:42 +03:00
  • 2b74703995 iq2k with make_qx_quants: adjust scale Iwan Kawrakow 2024-10-12 10:04:17 +03:00
  • e640a9ed88 iq2k: Try make_qx_quants for the scale Iwan Kawrakow 2024-10-12 09:52:49 +03:00
  • 9a6376af06 Experimenting Iwan Kawrakow 2024-10-12 07:49:55 +03:00
  • c4c70af543 Minor: printf -> LLAMA_LOG_INFO Kawrakow 2024-10-11 12:49:47 +03:00
  • c15de3654e Minor: printf -> LLAMA_LOG_INFO Iwan Kawrakow 2024-10-11 12:49:47 +03:00
  • 6a16fe2f4e Better model info (#84) Kawrakow 2024-10-10 18:21:24 +03:00
  • 70aca0b75c Better model info (#84) Kawrakow 2024-10-10 18:21:24 +03:00
  • e441c897a4 Better model info ik/better_model_info Iwan Kawrakow 2024-10-10 17:38:59 +03:00
  • e734e888e1 iq3_ks: AVX2 ik/iq3_ks Iwan Kawrakow 2024-10-10 10:48:42 +03:00
  • a1a3ce6886 iq3_ks: Metal - partially working Iwan Kawrakow 2024-10-10 10:32:42 +03:00
  • f4adef09e5 iq3_ks: slightly faster ARM_NEON Iwan Kawrakow 2024-10-10 08:53:15 +03:00
  • 42d85c58fb iq3_ks: ARM_NEON Iwan Kawrakow 2024-10-09 20:04:56 +03:00
  • 7c966a5eb4 iq3_ks: Zen4 Iwan Kawrakow 2024-10-09 18:54:34 +03:00
  • 0317ba5a01 iq3_ks: Fix CUDA dot product Iwan Kawrakow 2024-10-09 18:09:35 +03:00
  • 252c6b2d82 iq3_ks: CUDA works Iwan Kawrakow 2024-10-09 18:00:45 +03:00
  • 893ca1731c iq3_ks: basics Iwan Kawrakow 2024-10-09 17:27:05 +03:00
  • a10ccd65f3 New SOTA quantization: 4.25 bpw IQ4_KS (#83) Kawrakow 2024-10-09 12:54:40 +03:00
  • b30c9e10d8 New SOTA quantization: 4.25 bpw IQ4_KS (#83) Kawrakow 2024-10-09 12:54:40 +03:00
  • f61c37967a iq3_kl: use iq4_ks instead of iq4_k/iq4_xs ik/iq4_k_xxs Iwan Kawrakow 2024-10-09 12:50:43 +03:00
  • bb6eab889f iq4_xxs: rename to iq4_ks Iwan Kawrakow 2024-10-09 11:29:50 +03:00
  • 0e12b2919c iq4_xxs: slightly faster TG on Metal Iwan Kawrakow 2024-10-09 11:06:45 +03:00
  • bb522fb314 iq4_xxs: Metal Iwan Kawrakow 2024-10-09 10:45:41 +03:00
  • 5865c98a8a iq4_xxs: ARM_NEON Iwan Kawrakow 2024-10-09 09:52:48 +03:00
  • 9a3f445fc5 iq4_xxs: AVX2 Iwan Kawrakow 2024-10-09 09:11:11 +03:00
  • ee590519d2 Fix iq4_xs (Zen4) Iwan Kawrakow 2024-10-08 19:47:21 +03:00
  • c24ad0d1e7 iq4_xxs: Zen4 Iwan Kawrakow 2024-10-08 19:40:26 +03:00
  • 834af69e47 iq4_xxs: scalar CPU dot product Iwan Kawrakow 2024-10-08 17:11:42 +03:00
  • 81bd33213d iq4_xxs: CUDA dot product Iwan Kawrakow 2024-10-08 16:35:52 +03:00
  • 975292b6b9 iq4_xxs: this looks very viable compared to iq4_xs Iwan Kawrakow 2024-10-08 16:07:00 +03:00
  • 1dd6c40c15 WIP + adding iq3_kl quantization mix Iwan Kawrakow 2024-10-08 13:56:29 +03:00
  • 4c76471979 iq4_k_xxs: basics Iwan Kawrakow 2024-10-08 10:52:53 +03:00
  • df2bd86a31 WIP ik/qstats Iwan Kawrakow 2024-10-06 09:09:51 +03:00
  • 1ed460bcff WIP Iwan Kawrakow 2024-10-06 07:36:35 +03:00
  • c106b466b6 WIP Iwan Kawrakow 2024-10-05 19:19:10 +03:00
  • 403d4eef35 quantize-stats on transposed tensors Iwan Kawrakow 2024-10-05 17:57:52 +03:00
  • 6648952ed8 Fix compiler warnings Kawrakow 2024-10-04 16:17:36 +03:00
  • c0ddc644bb Fix compiler warnings Iwan Kawrakow 2024-10-04 16:17:36 +03:00
  • 65575488d9 Move scale fudge factors to quantization (#81) Kawrakow 2024-10-04 16:16:01 +03:00
  • fe36930c8b Move scale fudge factors to quantization (#81) Kawrakow 2024-10-04 16:16:01 +03:00
  • acaa4869af Move scale fudge factors to quantization ik/cleanup_fudge_factors Iwan Kawrakow 2024-10-04 16:10:08 +03:00
  • d2b53228f5 Move to c++17 projectwide (#80) Kawrakow 2024-10-04 14:43:26 +03:00
  • bc79091b0e Move to c++17 projectwide (#80) Kawrakow 2024-10-04 14:43:26 +03:00
  • a553eb191a Make the entire project c++17 ik/cpp_17 Iwan Kawrakow 2024-10-04 14:23:21 +03:00
  • 84ed711eec Slightly better Iwan Kawrakow 2024-10-04 14:18:44 +03:00
  • f1066edc4e Do not quantize activations if not necessary (#79) Kawrakow 2024-10-04 11:22:57 +03:00
  • 0bf4d99774 Do not quantize activations if not necessary (#79) Kawrakow 2024-10-04 11:22:57 +03:00
  • ed477f1cdc Do not quantize activations if not necessary also for MoE models ik/skip_unnecessary_quantize Iwan Kawrakow 2024-10-04 11:11:02 +03:00
  • 0b79e5ebbd Do not quantize activations if not necessary Iwan Kawrakow 2024-10-04 09:45:53 +03:00
  • b44d05dbe0 q6_0: Slightly faster Zen4/AVX2 (#78) Kawrakow 2024-10-02 18:09:47 +03:00
  • ba392802ef q6_0: Slightly faster Zen4/AVX2 (#78) Kawrakow 2024-10-02 18:09:47 +03:00
  • 38eb7fa499 q6_0: this is slightly better ik/faster_q60_avx2 Iwan Kawrakow 2024-10-02 18:07:55 +03:00
  • 9d1552a4fc Faster q6_0 on AVX2 Iwan Kawrakow 2024-10-02 17:27:10 +03:00
  • 4390096212 Fused unary(x)*y (#70) Kawrakow 2024-10-02 17:05:56 +03:00
  • 50b5e90112 Fused unary(x)*y (#70) Kawrakow 2024-10-02 17:05:56 +03:00
  • a8e932b734 Fused y*unary(x) op: Metal ik/fused_mul_unary Iwan Kawrakow 2024-09-30 10:48:28 +03:00
  • 6ada781597 Fused y*unary(x) op: dedicated CPU implementation for silu and gelu Iwan Kawrakow 2024-09-30 10:08:01 +03:00
  • 26cf34c9c3 Fused y*unary(x) op: CUDA Iwan Kawrakow 2024-09-30 08:57:06 +03:00
  • 6ef4f28aae Adding fused y*unary(x) op Iwan Kawrakow 2024-09-30 08:29:34 +03:00
  • 104e7e26c4 Adding Q6_0 (#77) Kawrakow 2024-10-02 15:22:13 +03:00
  • cce49832c1 Adding Q6_0 (#77) Kawrakow 2024-10-02 15:22:13 +03:00
  • 037bbd2d58 q6_0: can now be used for kv-cache on Metal ik/add_q60 Iwan Kawrakow 2024-10-02 14:54:25 +03:00
  • 0d0cd1ee68 q6_0: it now works on Metal Iwan Kawrakow 2024-10-02 14:42:32 +03:00
  • aae268f7be q6_0: dequantize works on Metal, but not vector dot product Iwan Kawrakow 2024-10-02 13:55:42 +03:00
  • 677fc29790 q6_0: works on ARM_NEON Iwan Kawrakow 2024-10-02 12:28:43 +03:00
  • a4b41b4870 q6_0: slightly better kv-cache result Iwan Kawrakow 2024-10-02 12:02:57 +03:00
  • 9e63f811e1 Add q6_0 to CPU flash attention Iwan Kawrakow 2024-10-02 11:34:10 +03:00
  • c255a14a45 Adding q6_0: CUDA cpy, so Q6_0 can be used for KV-cache Iwan Kawrakow 2024-10-02 10:50:37 +03:00
  • 4cdf9b333f Adding q6_0: CUDA mmvq works Iwan Kawrakow 2024-10-02 10:34:24 +03:00
  • 6b5c7c378e Adding q6_0: CUDA dequantize works, but not mmvq Iwan Kawrakow 2024-10-02 10:25:22 +03:00
  • 43c74f06da Adding q6_0 - basics + AVX2/Zen4 working Iwan Kawrakow 2024-10-02 09:25:00 +03:00
  • 6dec4112a1 iq4_nl: faster quantization (#76) Kawrakow 2024-10-02 08:17:00 +03:00
  • d6909ed6f0 iq4_nl: faster quantization (#76) Kawrakow 2024-10-02 08:17:00 +03:00
  • 1fb3115412 iq4_nl: faster quantization ik/faster_iq4nl_quantize Iwan Kawrakow 2024-10-02 07:43:09 +03:00
  • d2c74a369b Fix Q5_0 flash attention (#75) Kawrakow 2024-10-01 15:52:35 +03:00