Commit Graph

  • bb699e1e6b Q6_0_R4 (#122) Kawrakow 2024-12-03 14:48:26 +01:00
  • f1f4eb988f Q6_0_R4 (#122) Kawrakow 2024-12-03 14:48:26 +01:00
  • 7e60462c08 Fix AVX2 ik/q6_0_r4 Iwan Kawrakow 2024-12-03 15:43:11 +02:00
  • 1d2cdf2f58 q6_0_r4: NEON Iwan Kawrakow 2024-12-03 14:32:03 +01:00
  • e345ee837d Adding q6_0_r4 Iwan Kawrakow 2024-12-03 15:08:28 +02:00
  • d9593f3689 Q5_0_R4 (#121) Kawrakow 2024-12-03 12:59:22 +01:00
  • c5bf589367 Q5_0_R4 (#121) Kawrakow 2024-12-03 12:59:22 +01:00
  • 5bbdfc8bac q5_0_r4: NEON ik/q5_0_r4 Iwan Kawrakow 2024-12-03 11:09:34 +01:00
  • fad847d753 Adding q5_0_r4 Iwan Kawrakow 2024-12-03 11:29:57 +02:00
  • 6b26cb05f5 Q8_0_R4 (#120) Kawrakow 2024-12-03 06:15:29 +01:00
  • ccec00939a Q8_0_R4 (#120) Kawrakow 2024-12-03 06:15:29 +01:00
  • 4559dc5931 q8_0_r4: Zen4 matrix-vector specialization ik/q8_0_r4 Iwan Kawrakow 2024-12-03 06:52:12 +02:00
  • fc781beec0 q8_0_r4: NEON Iwan Kawrakow 2024-12-02 18:40:31 +01:00
  • e1b922d3f3 Adding q8_0_r4 Iwan Kawrakow 2024-12-02 19:22:04 +02:00
  • 61304f5c04 Q4_0_R4 (#119) Kawrakow 2024-12-02 17:01:48 +01:00
  • 239a344f99 Q4_0_R4 (#119) Kawrakow 2024-12-02 17:01:48 +01:00
  • 2b69d1def8 q4_0_r4: NEON ik/q4_0_r4 Iwan Kawrakow 2024-12-02 16:26:14 +01:00
  • 63053e1e91 Adding iq4_0_r4 - q4_0 repacked Iwan Kawrakow 2024-12-02 16:54:25 +02:00
  • 72d94fbf22 IQ4_NL_X4 (#118) Kawrakow 2024-12-02 07:25:39 +01:00
  • 6d0462d4a3 IQ4_NL_X4 (#118) Kawrakow 2024-12-02 07:25:39 +01:00
  • c539fe0a51 iq4_nl_x4: NEON specialization for matrix x vector ik/iq4_nl_x4 Iwan Kawrakow 2024-12-01 18:37:09 +01:00
  • 9702ebd078 iq4_nl_x4: minor NEON improvement and cleanup Iwan Kawrakow 2024-12-01 17:15:31 +01:00
  • e2ba3d53e5 iq4_nl_x4: NEON Iwan Kawrakow 2024-12-01 16:03:36 +01:00
  • 1a8ee094ad iq4_nl_x4: AVX2 Iwan Kawrakow 2024-11-30 20:22:34 +02:00
  • 9982d420ef iq4_nl_x4: getting amazing Iwan Kawrakow 2024-11-30 19:32:45 +02:00
  • 422e5768e4 Adding iq4_nl_x4 Iwan Kawrakow 2024-11-30 09:21:04 +02:00
  • 93e2c97a8b iq4_kss attempt - not as good as original ik/iq4kss_experiments Iwan Kawrakow 2024-11-29 11:46:10 +02:00
  • 2640bd9ea4 Minor q2_K quantization improvement ik/iq2ks_experiments Iwan Kawrakow 2024-11-25 19:16:24 +02:00
  • 2f749cfdba iq2k improvement Iwan Kawrakow 2024-11-25 18:19:16 +02:00
  • 85d1011f52 Another iq3k improvement Iwan Kawrakow 2024-11-25 10:11:02 +02:00
  • 55db84400a Small iq3k improvement Iwan Kawrakow 2024-11-24 18:25:49 +02:00
  • 74e3b1fad7 Minor Iwan Kawrakow 2024-11-24 17:11:11 +02:00
  • 65ebc6f986 iq4_ks: minor PPL improvement Iwan Kawrakow 2024-11-24 12:01:18 +02:00
  • 70815ec5b2 iq2k: quantization improvement Iwan Kawrakow 2024-11-24 11:29:37 +02:00
  • 7447c55a8a iq2k: small PPL improvement Iwan Kawrakow 2024-11-23 19:18:45 +02:00
  • 3cac58e182 iq2ks: small PPL improvement Iwan Kawrakow 2024-11-23 12:27:14 +02:00
  • 6c73f704ca Use Q6_0 instead of Q5_1 for tensors incompatible with IQ5_K/Q5_K (#116) Nexes the Elder 2024-11-21 07:12:57 +01:00
  • 8ad84b9fab Use Q6_0 instead of Q5_1 for tensors incompatible with IQ5_K/Q5_K (#116) Nexes the Elder 2024-11-21 07:12:57 +01:00
  • 3a9926b932 Checkpoint Iwan Kawrakow 2024-11-19 17:31:07 +02:00
  • 2be4cffe66 Minor tweaks Iwan Kawrakow 2024-11-18 15:08:12 +02:00
  • 5705dc7f2e Report actual bpw Iwan Kawrakow 2024-11-15 17:05:18 +02:00
  • 3ee5434601 DRY Iwan Kawrakow 2024-11-15 17:01:53 +02:00
  • 81cd220f93 iq4_kt: CUDA dot product works Iwan Kawrakow 2024-11-15 16:51:43 +02:00
  • 79565c92e0 DRY Iwan Kawrakow 2024-11-15 16:12:37 +02:00
  • e338e0a0cd DRY Iwan Kawrakow 2024-11-15 15:59:49 +02:00
  • 4cf82e7e2f iq4_kt: failed attemt to adjust CUDA dot product Iwan Kawrakow 2024-11-15 14:53:58 +02:00
  • ab1cef30e7 iq4_kt: very slightly better Iwan Kawrakow 2024-11-15 12:27:10 +02:00
  • 1be0a9e0d7 iq4_kt: go to 4.0 bpw Iwan Kawrakow 2024-11-15 09:38:47 +02:00
  • 21903f19b4 WIP Iwan Kawrakow 2024-11-14 15:53:33 +02:00
  • c20b22b9a0 iq3_kt: small progress Iwan Kawrakow 2024-11-14 13:59:22 +02:00
  • 4213ab1cb3 iq2_kt: SOTA Iwan Kawrakow 2024-11-14 11:55:55 +02:00
  • 215bea5c6a iq3_kt: small improvements and faster quantization Iwan Kawrakow 2024-11-13 16:36:08 +02:00
  • dbe085474a iq2_kt: SOTA Iwan Kawrakow 2024-11-13 11:24:16 +02:00
  • 200a19f18f iq2_kt: SOTA Iwan Kawrakow 2024-11-13 07:27:15 +02:00
  • de7fe92833 iq4_kt: minor tweaks Iwan Kawrakow 2024-11-11 19:59:15 +02:00
  • e9ced1bbe6 iq4_kt: CUDA dot product Iwan Kawrakow 2024-11-11 19:10:56 +02:00
  • 21ee589996 WIP Iwan Kawrakow 2024-11-11 18:51:50 +02:00
  • 1d6ca83203 WIP Iwan Kawrakow 2024-11-11 15:30:02 +02:00
  • 00b4bff286 Adding iq4_kt - not competitive at this point Iwan Kawrakow 2024-11-11 12:34:00 +02:00
  • 47b28c1e92 iq2_kt: SOTA Iwan Kawrakow 2024-11-11 09:30:22 +02:00
  • 4608f0cc6d iq2_kt: SOTA Iwan Kawrakow 2024-11-10 17:21:32 +02:00
  • 0ffc9b435c iq3_kt: CUDA dot product Iwan Kawrakow 2024-11-10 12:07:42 +02:00
  • e9e5879b94 iq3_kt speed up quantization Iwan Kawrakow 2024-11-10 09:56:29 +02:00
  • c59830dafb iq3_kt WIP: speed up quantization Iwan Kawrakow 2024-11-10 09:22:26 +02:00
  • 8f0d075f5e iq3_kt WIP: slowly improving Iwan Kawrakow 2024-11-09 11:42:14 +02:00
  • dfcc8a9cf3 iq3_kt WIP: slowly improving Iwan Kawrakow 2024-11-09 09:32:00 +02:00
  • 386d139e13 WIP Iwan Kawrakow 2024-11-09 07:25:56 +02:00
  • f1fb59b44b iq3_kt WIP: slowly improving Iwan Kawrakow 2024-11-08 18:39:23 +02:00
  • 435eb9bdd3 WIP Iwan Kawrakow 2024-11-08 16:28:58 +02:00
  • 08503cec7d WIP Iwan Kawrakow 2024-11-08 09:12:57 +02:00
  • 977f94b3e0 Forgotten change Iwan Kawrakow 2024-11-07 19:04:33 +02:00
  • 4774788136 Adding iq3_kt Iwan Kawrakow 2024-11-07 19:02:06 +02:00
  • 590f47278b Minor Iwan Kawrakow 2024-11-07 14:56:13 +02:00
  • 7bf6e158a9 iq2_kt: faster f16 CUDA dot product Iwan Kawrakow 2024-11-07 14:35:22 +02:00
  • 7cafafc69e iq2_kt: faster f16 CUDA dot product Iwan Kawrakow 2024-11-07 14:20:34 +02:00
  • b354392c77 iq2_kt: f16 CUDA dot product Iwan Kawrakow 2024-11-07 12:32:10 +02:00
  • aed3910dfa iq2_kt: very slightly faster CUDA dot product Iwan Kawrakow 2024-11-07 11:24:23 +02:00
  • d2331b9287 iq2_kt: CUDA dot product Iwan Kawrakow 2024-11-07 11:01:11 +02:00
  • b3dfe9984b iq2_kt - even better Iwan Kawrakow 2024-11-07 08:38:20 +02:00
  • 36e9c922b8 iq2_kt - this is better Iwan Kawrakow 2024-11-06 20:49:56 +02:00
  • 766fa600c8 WIP - try larger blocks Iwan Kawrakow 2024-11-06 16:24:17 +02:00
  • 86948f9c5d WIP Iwan Kawrakow 2024-11-06 11:57:13 +02:00
  • a961a48e88 WIP Iwan Kawrakow 2024-11-06 11:05:07 +02:00
  • 426a6e685f iq2_kt: CUDA dequantize Iwan Kawrakow 2024-11-06 08:34:58 +02:00
  • a4f1ac8da4 iq2_kt: quantize / dequantize Iwan Kawrakow 2024-11-05 18:50:08 +02:00
  • f1df1b7e15 Testing Trellis quantization: playing with scales and generators Iwan Kawrakow 2024-11-05 16:20:09 +02:00
  • 9ec145550d Testing Trellis quantization: 4-bit quantized block scales Iwan Kawrakow 2024-11-05 14:57:50 +02:00
  • f21dd3fb15 Testing Trellis quantization Iwan Kawrakow 2024-11-05 14:11:14 +02:00
  • afe9db7143 WIP Iwan Kawrakow 2024-11-05 13:47:38 +02:00
  • c578478911 WIP Iwan Kawrakow 2024-11-05 13:32:31 +02:00
  • 798f93ce40 WIP Iwan Kawrakow 2024-11-05 11:38:26 +02:00
  • 01f53e313f Use Q6_0 instead of Q5_1 for tensors incompatible with IQ5_K/Q5_K (#116) ik/q60_mmq Nexes the Elder 2024-11-21 07:12:57 +01:00
  • f0a0503ec0 MMQ for Q6_0 (#115) Kawrakow 2024-11-21 07:12:11 +01:00
  • 4d2fbde0cb MMQ for Q6_0 (#115) Kawrakow 2024-11-21 07:12:11 +01:00
  • 2c6d107267 Add Q6_0 MMQ to template generator Iwan Kawrakow 2024-11-20 18:51:30 +02:00
  • fb7ca43080 MMQ for Q6_0 Iwan Kawrakow 2024-11-20 18:47:31 +02:00
  • da9fdf57d8 Faster iq4_k: Metal ik/faster_iq4k Iwan Kawrakow 2024-11-05 09:46:43 +01:00
  • 9d713516cd Faster iq4_k: NEON Iwan Kawrakow 2024-11-05 08:14:42 +01:00
  • 68e6d168a2 Faster iq4_k: CUDA Iwan Kawrakow 2024-11-04 14:37:50 +02:00
  • 48974c7acd iq4_k: Rearrange blocks for faster matrix multiplications Iwan Kawrakow 2024-11-04 10:22:59 +02:00