Commit Graph

  • 6da5afa3f6 The trellis quants now need super-blocks of 256, so we need a check ik/new_iq2kt_v2 Iwan Kawrakow 2025-06-16 19:18:16 +03:00
  • c201aa7fca iq3_kt: AVX2 GEMV Iwan Kawrakow 2025-06-16 18:41:09 +03:00
  • 6153d0e794 iq3_kt: AVX2 GEMM Iwan Kawrakow 2025-06-16 18:21:33 +03:00
  • f6fa5652a3 iq3_kt: MMQ Iwan Kawrakow 2025-06-16 17:21:17 +03:00
  • 32ff1f956f iq3_kt: use integer trellis + CUDA dequantize and MMVQ Iwan Kawrakow 2025-06-16 16:57:16 +03:00
  • 6d38e43f1d CPU Iwan Kawrakow 2025-06-14 06:19:18 +03:00
  • de4e6c797f Trying @louiehelm's multiplier Iwan Kawrakow 2025-06-13 19:38:11 +03:00
  • 57e882fd84 Add missing break Iwan Kawrakow 2025-06-09 15:18:11 +03:00
  • f2be982fd8 New iq2_kt: Metal - very slow. Iwan Kawrakow 2025-06-09 15:04:23 +03:00
  • d075a1c75b New iq2_kt: slightly faster NEON GEMM Iwan Kawrakow 2025-06-09 12:58:27 +03:00
  • 08067aa7a7 New iq2_kt: NEON GEMM/GEMV Iwan Kawrakow 2025-06-09 12:20:42 +03:00
  • 997239b757 Adding forgotten file Iwan Kawrakow 2025-06-09 10:26:27 +03:00
  • c9800ae62e New iq2_kt: AVX2 GEMM/GEMV Iwan Kawrakow 2025-06-09 09:58:39 +03:00
  • 8db83dac1d New iq2_kt: AVX2 dequantize Iwan Kawrakow 2025-06-09 09:00:56 +03:00
  • 1efb3adc9b New iq2_kt: CUDA GEMV Iwan Kawrakow 2025-06-08 17:51:28 +03:00
  • e095b0fa80 Switching iq2_kt to new trellis - CUDA MMQ Iwan Kawrakow 2025-06-08 17:27:40 +03:00
  • 41187e4a93 Adding forgottent file Iwan Kawrakow 2025-06-09 09:59:39 +03:00
  • 6480fa5967 Cleanup Iwan Kawrakow 2025-06-08 13:44:20 +03:00
  • be78290a23 Remove the extra 4 bytes of row meta data that is no longer used Iwan Kawrakow 2025-06-08 13:36:47 +03:00
  • 8744302858 New iq4_kt trellis: not working Metal implementation Iwan Kawrakow 2025-06-08 11:47:55 +03:00
  • d6ac52c0d7 Minor Iwan Kawrakow 2025-06-08 10:29:40 +03:00
  • 07d6e1d4b1 New iq4_kt: faster NEON Iwan Kawrakow 2025-06-08 10:05:49 +03:00
  • 4102aa998c New iq4_kt: slightly faster NEON Iwan Kawrakow 2025-06-08 09:17:32 +03:00
  • dcb464a4cb New iq4_kt: slightly faster NEON Iwan Kawrakow 2025-06-08 08:43:02 +03:00
  • 68ef8a7ae9 New iq4_kt: NEON implementation Iwan Kawrakow 2025-06-08 08:31:56 +03:00
  • 608e0f497b New iq4_kt: fix vanilla AVX2 Iwan Kawrakow 2025-06-07 19:33:33 +03:00
  • 78411343cc New iq4_kt: AVX2 dot product finally works Iwan Kawrakow 2025-06-07 19:12:09 +03:00
  • 36fba1fff2 Fix iq2_kt that got broken along the way Iwan Kawrakow 2025-06-07 18:51:02 +03:00
  • 6ba96c8b33 For now have only iq4_kt use the new trellis Iwan Kawrakow 2025-06-07 18:35:10 +03:00
  • b5524af7a4 New iq4_kt: CUDA MMQ Iwan Kawrakow 2025-06-07 18:21:43 +03:00
  • 6d6e6e39c9 New iq4_kt: CUDA MMVQ Iwan Kawrakow 2025-06-07 17:43:36 +03:00
  • de0b38dcdc Something is not working with the AVX2 dot product Iwan Kawrakow 2025-06-07 16:18:58 +03:00
  • e558992f0c New iq4_kt trellis Iwan Kawrakow 2025-06-07 12:30:37 +03:00
  • 129b58b150 Much faster CPU prompt processing (part 3) (#534) Kawrakow 2025-06-18 15:30:56 +03:00
  • c410cc72bb Much faster CPU prompt processing (part 3) (#534) Kawrakow 2025-06-18 15:30:56 +03:00
  • 4b6f0ff9c1 q2_K ik/legacy_gemm Iwan Kawrakow 2025-06-18 13:58:31 +03:00
  • 222833d373 iq3_xs Iwan Kawrakow 2025-06-18 12:25:32 +03:00
  • c6048c478a q5_1: 125 t/s -> 253 t/s Iwan Kawrakow 2025-06-18 11:43:09 +03:00
  • 8d393fccf5 q4_1: 135 t/s -> 262 t/s Iwan Kawrakow 2025-06-18 11:30:34 +03:00
  • 0788a71fcf iq4_nl Iwan Kawrakow 2025-06-18 10:33:46 +03:00
  • 516b1d54f1 q6_0 Iwan Kawrakow 2025-06-18 10:23:22 +03:00
  • 6b7cd02abf q5_0 and use a dequntizing template Iwan Kawrakow 2025-06-18 10:00:31 +03:00
  • e6cf67a47c Change q8_2_x4 to store in16_t sums Iwan Kawrakow 2025-06-18 09:43:26 +03:00
  • a16f961033 Repack q4_0 and q8_0 to q8_0_R8 Iwan Kawrakow 2025-06-18 08:46:47 +03:00
  • 57d283de02 Much faster CPU prompt processing (part 2) (#533) Kawrakow 2025-06-18 07:29:33 +03:00
  • dc96820ddb Much faster CPU prompt processing (part 2) (#533) Kawrakow 2025-06-18 07:29:33 +03:00
  • b7744eee27 iq2_ks ik/iqk_gemm Iwan Kawrakow 2025-06-17 17:26:13 +03:00
  • d99606dc1a iq2_k Iwan Kawrakow 2025-06-17 16:25:25 +03:00
  • 8d4e5cbf02 iq3_k Iwan Kawrakow 2025-06-17 15:35:41 +03:00
  • b77b7a82a7 iq6_k Iwan Kawrakow 2025-06-17 13:43:14 +03:00
  • f682afb407 iq5_k - there was a bug with the shifts Iwan Kawrakow 2025-06-17 12:42:06 +03:00
  • 4c00c088d1 iq5_k - accuracy loss is too big Iwan Kawrakow 2025-06-17 11:23:59 +03:00
  • e323a5bbb6 iq5_ks Iwan Kawrakow 2025-06-17 10:44:07 +03:00
  • b8142a583d Send [DONE] for OAI compatibility (#470) Kawrakow 2025-06-17 10:32:53 +03:00
  • 8b3002bba2 Send [DONE] for OAI compatibility (#470) Kawrakow 2025-06-17 10:32:53 +03:00
  • 1e9839a4b3 iq4_k Iwan Kawrakow 2025-06-17 09:39:11 +03:00
  • fa0620f5e7 iq4_ks Iwan Kawrakow 2025-06-17 08:24:16 +03:00
  • 23ac643459 Much faster CPU prompt processing (part 1) (#531) Kawrakow 2025-06-17 07:12:48 +03:00
  • 0f8f8b32e2 Much faster CPU prompt processing (part 1) (#531) Kawrakow 2025-06-17 07:12:48 +03:00
  • 72fd9faa9f Slightly faster ik/q6_k_gemm Iwan Kawrakow 2025-06-16 10:43:44 +03:00
  • c699367fea iq1_m: slightly faster Iwan Kawrakow 2025-06-16 08:03:54 +03:00
  • 4813e458a0 iq1_m: repack to q8_k_r8 Iwan Kawrakow 2025-06-15 20:03:56 +03:00
  • 67632541ce iq1_s: repack to q8_k_r8 Iwan Kawrakow 2025-06-15 19:16:13 +03:00
  • 6760096deb iq3_s: use q8_k_r8 Iwan Kawrakow 2025-06-15 18:29:12 +03:00
  • a2f5c251fb iq3_xxs: repack to q8_k_r8 Iwan Kawrakow 2025-06-15 17:46:14 +03:00
  • 7da3c043e4 iq2_xs: repack to q8_k_r8 Iwan Kawrakow 2025-06-15 15:58:24 +03:00
  • d08b635612 WIP Iwan Kawrakow 2025-06-15 14:50:16 +03:00
  • c2c8d70187 iq2_xs: rapck to q8_k_r8 Iwan Kawrakow 2025-06-15 12:31:12 +03:00
  • fc67346225 iq2_s: repack to q8_k_r8 instead of q8_0_r8 Iwan Kawrakow 2025-06-15 11:54:42 +03:00
  • 9fe58aac13 q3_K: don't scale when all quants in a block are <= 127 when repacking Iwan Kawrakow 2025-06-15 10:52:26 +03:00
  • e10f7d1f10 q3_K: repack to q8_k_r8 instead of q8_0_r8 Iwan Kawrakow 2025-06-15 10:37:12 +03:00
  • b22bdd965d Fix q8_k_r8 on Zen4 Iwan Kawrakow 2025-06-15 08:05:55 +03:00
  • c7b8c0f865 q3_K Iwan Kawrakow 2025-06-14 17:37:35 +03:00
  • 649055ec8a iq2_s Iwan Kawrakow 2025-06-14 15:53:04 +03:00
  • 999d7f84b4 Fix AVX2 Iwan Kawrakow 2025-06-14 14:59:34 +03:00
  • a442d69990 iq2_xs Iwan Kawrakow 2025-06-14 14:50:03 +03:00
  • b02a73c1ec Fix AVX2 Iwan Kawrakow 2025-06-14 11:20:35 +03:00
  • 13ce76a85c We don't need the changes in ggml.c Iwan Kawrakow 2025-06-14 10:04:05 +03:00
  • 51560b3656 Very slightly better Iwan Kawrakow 2025-06-14 09:54:29 +03:00
  • 8f415db1f2 Finally q6_K x q8_2_x4 dot product works Iwan Kawrakow 2025-06-14 09:17:03 +03:00
  • bb8ff4e9f0 Call iqk_convert_repack in MoE GEMM (#528) Kawrakow 2025-06-14 05:52:46 +03:00
  • 6fc5bbb657 Call iqk_convert_repack in MoE GEMM (#528) Kawrakow 2025-06-14 05:52:46 +03:00
  • 8d97f53699 Call iqk_convert_repack in MoE GEMM ik/fix_bug_481 Iwan Kawrakow 2025-06-14 05:47:00 +03:00
  • 7cf0d8b7d9 WIP Iwan Kawrakow 2025-06-13 18:43:14 +03:00
  • d454ada64f Much easier: just use different vec_dot types! Iwan Kawrakow 2025-06-13 17:43:28 +03:00
  • 853d581de0 q6_K dequantizing GEMM Iwan Kawrakow 2025-06-13 15:22:03 +03:00
  • b7768e203f Faster CPU prompt processing for Q4_K and Q5_K (#525) Kawrakow 2025-06-13 07:58:15 +03:00
  • 066ed4fd11 Faster CPU prompt processing for Q4_K and Q5_K (#525) Kawrakow 2025-06-13 07:58:15 +03:00
  • ed868d928c Update News section of readme (#510) saood06 2025-06-12 23:56:40 -05:00
  • f72983f7fe Update News section of readme (#510) saood06 2025-06-12 23:56:40 -05:00
  • fb30146ce8 Perhaps a slightly better version for IQ2_XXS, IQ3_XXS, IQ3_S GEMV (#524) Kawrakow 2025-06-13 07:55:57 +03:00
  • 7a882f0b63 Perhaps a slightly better version for IQ2_XXS, IQ3_XXS, IQ3_S GEMV (#524) Kawrakow 2025-06-13 07:55:57 +03:00
  • 7986500f9d Add all IQK quants s6/readme_update Saood Karim 2025-06-12 12:25:43 -05:00
  • ae1c06df66 Add more old PRs Saood Karim 2025-06-12 11:52:25 -05:00
  • 77ad5bbed1 Merge remote-tracking branch 'refs/remotes/origin/s6/readme_update' into s6/readme_update Saood Karim 2025-06-12 11:49:07 -05:00
  • bcbd109bdc Add old PRs Saood Karim 2025-06-12 11:47:39 -05:00
  • 8ba852a2f3 Remove the scales, they are not needed ik/q4_k_gemm Iwan Kawrakow 2025-06-12 13:17:27 +03:00
  • 5432108e9c q5_K: GEMM with q8_2_X4 and repack to q8_1_r8 Iwan Kawrakow 2025-06-12 13:10:24 +03:00
  • 4b8f765870 q4_K: GEMM with q8_2_X4 Iwan Kawrakow 2025-06-12 12:30:23 +03:00
  • 8de4c019d0 q4_K: dequantize to q8_1_r8 for batch >= 32 Iwan Kawrakow 2025-06-12 11:09:51 +03:00