Commit Graph

  • dcb464a4cb New iq4_kt: slightly faster NEON Iwan Kawrakow 2025-06-08 08:43:02 +03:00
  • 68ef8a7ae9 New iq4_kt: NEON implementation Iwan Kawrakow 2025-06-08 08:31:56 +03:00
  • 608e0f497b New iq4_kt: fix vanilla AVX2 Iwan Kawrakow 2025-06-07 19:33:33 +03:00
  • 78411343cc New iq4_kt: AVX2 dot product finally works Iwan Kawrakow 2025-06-07 19:12:09 +03:00
  • 36fba1fff2 Fix iq2_kt that got broken along the way Iwan Kawrakow 2025-06-07 18:51:02 +03:00
  • 6ba96c8b33 For now have only iq4_kt use the new trellis Iwan Kawrakow 2025-06-07 18:35:10 +03:00
  • b5524af7a4 New iq4_kt: CUDA MMQ Iwan Kawrakow 2025-06-07 18:21:43 +03:00
  • 6d6e6e39c9 New iq4_kt: CUDA MMVQ Iwan Kawrakow 2025-06-07 17:43:36 +03:00
  • de0b38dcdc Something is not working with the AVX2 dot product Iwan Kawrakow 2025-06-07 16:18:58 +03:00
  • e558992f0c New iq4_kt trellis Iwan Kawrakow 2025-06-07 12:30:37 +03:00
  • 129b58b150 Much faster CPU prompt processing (part 3) (#534) Kawrakow 2025-06-18 15:30:56 +03:00
  • c410cc72bb Much faster CPU prompt processing (part 3) (#534) Kawrakow 2025-06-18 15:30:56 +03:00
  • 4b6f0ff9c1 q2_K ik/legacy_gemm Iwan Kawrakow 2025-06-18 13:58:31 +03:00
  • 222833d373 iq3_xs Iwan Kawrakow 2025-06-18 12:25:32 +03:00
  • c6048c478a q5_1: 125 t/s -> 253 t/s Iwan Kawrakow 2025-06-18 11:43:09 +03:00
  • 8d393fccf5 q4_1: 135 t/s -> 262 t/s Iwan Kawrakow 2025-06-18 11:30:34 +03:00
  • 0788a71fcf iq4_nl Iwan Kawrakow 2025-06-18 10:33:46 +03:00
  • 516b1d54f1 q6_0 Iwan Kawrakow 2025-06-18 10:23:22 +03:00
  • 6b7cd02abf q5_0 and use a dequntizing template Iwan Kawrakow 2025-06-18 10:00:31 +03:00
  • e6cf67a47c Change q8_2_x4 to store in16_t sums Iwan Kawrakow 2025-06-18 09:43:26 +03:00
  • a16f961033 Repack q4_0 and q8_0 to q8_0_R8 Iwan Kawrakow 2025-06-18 08:46:47 +03:00
  • 57d283de02 Much faster CPU prompt processing (part 2) (#533) Kawrakow 2025-06-18 07:29:33 +03:00
  • dc96820ddb Much faster CPU prompt processing (part 2) (#533) Kawrakow 2025-06-18 07:29:33 +03:00
  • b7744eee27 iq2_ks ik/iqk_gemm Iwan Kawrakow 2025-06-17 17:26:13 +03:00
  • d99606dc1a iq2_k Iwan Kawrakow 2025-06-17 16:25:25 +03:00
  • 8d4e5cbf02 iq3_k Iwan Kawrakow 2025-06-17 15:35:41 +03:00
  • b77b7a82a7 iq6_k Iwan Kawrakow 2025-06-17 13:43:14 +03:00
  • f682afb407 iq5_k - there was a bug with the shifts Iwan Kawrakow 2025-06-17 12:42:06 +03:00
  • 4c00c088d1 iq5_k - accuracy loss is too big Iwan Kawrakow 2025-06-17 11:23:59 +03:00
  • e323a5bbb6 iq5_ks Iwan Kawrakow 2025-06-17 10:44:07 +03:00
  • b8142a583d Send [DONE] for OAI compatibility (#470) Kawrakow 2025-06-17 10:32:53 +03:00
  • 8b3002bba2 Send [DONE] for OAI compatibility (#470) Kawrakow 2025-06-17 10:32:53 +03:00
  • 1e9839a4b3 iq4_k Iwan Kawrakow 2025-06-17 09:39:11 +03:00
  • fa0620f5e7 iq4_ks Iwan Kawrakow 2025-06-17 08:24:16 +03:00
  • 23ac643459 Much faster CPU prompt processing (part 1) (#531) Kawrakow 2025-06-17 07:12:48 +03:00
  • 0f8f8b32e2 Much faster CPU prompt processing (part 1) (#531) Kawrakow 2025-06-17 07:12:48 +03:00
  • 72fd9faa9f Slightly faster ik/q6_k_gemm Iwan Kawrakow 2025-06-16 10:43:44 +03:00
  • c699367fea iq1_m: slightly faster Iwan Kawrakow 2025-06-16 08:03:54 +03:00
  • 4813e458a0 iq1_m: repack to q8_k_r8 Iwan Kawrakow 2025-06-15 20:03:56 +03:00
  • 67632541ce iq1_s: repack to q8_k_r8 Iwan Kawrakow 2025-06-15 19:16:13 +03:00
  • 6760096deb iq3_s: use q8_k_r8 Iwan Kawrakow 2025-06-15 18:29:12 +03:00
  • a2f5c251fb iq3_xxs: repack to q8_k_r8 Iwan Kawrakow 2025-06-15 17:46:14 +03:00
  • 7da3c043e4 iq2_xs: repack to q8_k_r8 Iwan Kawrakow 2025-06-15 15:58:24 +03:00
  • d08b635612 WIP Iwan Kawrakow 2025-06-15 14:50:16 +03:00
  • c2c8d70187 iq2_xs: rapck to q8_k_r8 Iwan Kawrakow 2025-06-15 12:31:12 +03:00
  • fc67346225 iq2_s: repack to q8_k_r8 instead of q8_0_r8 Iwan Kawrakow 2025-06-15 11:54:42 +03:00
  • 9fe58aac13 q3_K: don't scale when all quants in a block are <= 127 when repacking Iwan Kawrakow 2025-06-15 10:52:26 +03:00
  • e10f7d1f10 q3_K: repack to q8_k_r8 instead of q8_0_r8 Iwan Kawrakow 2025-06-15 10:37:12 +03:00
  • b22bdd965d Fix q8_k_r8 on Zen4 Iwan Kawrakow 2025-06-15 08:05:55 +03:00
  • c7b8c0f865 q3_K Iwan Kawrakow 2025-06-14 17:37:35 +03:00
  • 649055ec8a iq2_s Iwan Kawrakow 2025-06-14 15:53:04 +03:00
  • 999d7f84b4 Fix AVX2 Iwan Kawrakow 2025-06-14 14:59:34 +03:00
  • a442d69990 iq2_xs Iwan Kawrakow 2025-06-14 14:50:03 +03:00
  • b02a73c1ec Fix AVX2 Iwan Kawrakow 2025-06-14 11:20:35 +03:00
  • 13ce76a85c We don't need the changes in ggml.c Iwan Kawrakow 2025-06-14 10:04:05 +03:00
  • 51560b3656 Very slightly better Iwan Kawrakow 2025-06-14 09:54:29 +03:00
  • 8f415db1f2 Finally q6_K x q8_2_x4 dot product works Iwan Kawrakow 2025-06-14 09:17:03 +03:00
  • bb8ff4e9f0 Call iqk_convert_repack in MoE GEMM (#528) Kawrakow 2025-06-14 05:52:46 +03:00
  • 6fc5bbb657 Call iqk_convert_repack in MoE GEMM (#528) Kawrakow 2025-06-14 05:52:46 +03:00
  • 8d97f53699 Call iqk_convert_repack in MoE GEMM ik/fix_bug_481 Iwan Kawrakow 2025-06-14 05:47:00 +03:00
  • 7cf0d8b7d9 WIP Iwan Kawrakow 2025-06-13 18:43:14 +03:00
  • d454ada64f Much easier: just use different vec_dot types! Iwan Kawrakow 2025-06-13 17:43:28 +03:00
  • 853d581de0 q6_K dequantizing GEMM Iwan Kawrakow 2025-06-13 15:22:03 +03:00
  • b7768e203f Faster CPU prompt processing for Q4_K and Q5_K (#525) Kawrakow 2025-06-13 07:58:15 +03:00
  • 066ed4fd11 Faster CPU prompt processing for Q4_K and Q5_K (#525) Kawrakow 2025-06-13 07:58:15 +03:00
  • ed868d928c Update News section of readme (#510) saood06 2025-06-12 23:56:40 -05:00
  • f72983f7fe Update News section of readme (#510) saood06 2025-06-12 23:56:40 -05:00
  • fb30146ce8 Perhaps a slightly better version for IQ2_XXS, IQ3_XXS, IQ3_S GEMV (#524) Kawrakow 2025-06-13 07:55:57 +03:00
  • 7a882f0b63 Perhaps a slightly better version for IQ2_XXS, IQ3_XXS, IQ3_S GEMV (#524) Kawrakow 2025-06-13 07:55:57 +03:00
  • 7986500f9d Add all IQK quants s6/readme_update Saood Karim 2025-06-12 12:25:43 -05:00
  • ae1c06df66 Add more old PRs Saood Karim 2025-06-12 11:52:25 -05:00
  • 77ad5bbed1 Merge remote-tracking branch 'refs/remotes/origin/s6/readme_update' into s6/readme_update Saood Karim 2025-06-12 11:49:07 -05:00
  • bcbd109bdc Add old PRs Saood Karim 2025-06-12 11:47:39 -05:00
  • 8ba852a2f3 Remove the scales, they are not needed ik/q4_k_gemm Iwan Kawrakow 2025-06-12 13:17:27 +03:00
  • 5432108e9c q5_K: GEMM with q8_2_X4 and repack to q8_1_r8 Iwan Kawrakow 2025-06-12 13:10:24 +03:00
  • 4b8f765870 q4_K: GEMM with q8_2_X4 Iwan Kawrakow 2025-06-12 12:30:23 +03:00
  • 8de4c019d0 q4_K: dequantize to q8_1_r8 for batch >= 32 Iwan Kawrakow 2025-06-12 11:09:51 +03:00
  • dc663fe632 Better strategy for GPU offload (#520) Kawrakow 2025-06-12 19:25:11 +03:00
  • b57bd8658b Better strategy for GPU offload (#520) Kawrakow 2025-06-12 19:25:11 +03:00
  • 1a8a0e5e63 Perhaps a slightly better version for IQ2_XXS, IQ3_XXS, IQ3_S GEMV ik/iq_gemv_tweaks Iwan Kawrakow 2025-06-12 19:09:31 +03:00
  • 07777dde1f Add top n sigma sampler and other webui fix (#512) firecoperana 2025-06-12 00:19:26 -05:00
  • 7b1a3eece7 Add top n sigma sampler and other webui fix (#512) firecoperana 2025-06-12 00:19:26 -05:00
  • baa412aa42 iq3_s: much faster GEMM via repacking to q8_0_r8 (#518) Kawrakow 2025-06-12 08:16:12 +03:00
  • 4fc3cb4a47 iq3_s: much faster GEMM via repacking to q8_0_r8 (#518) Kawrakow 2025-06-12 08:16:12 +03:00
  • cdcb324fe6 Better strategy for GPU offload ik/moe_offload_strategy Iwan Kawrakow 2025-06-11 19:44:05 +03:00
  • ec530d4e5f iq3_s: much faster GEMM via repacking to q8_0_r8 ik/iq3_s_gemm Iwan Kawrakow 2025-06-11 16:19:54 +03:00
  • bbf3f20df1 Faster iq1_s GEMM via repacking to Q8_0_R8 (#517) Kawrakow 2025-06-11 15:01:34 +03:00
  • 3f54b49786 Faster iq1_s GEMM via repacking to Q8_0_R8 (#517) Kawrakow 2025-06-11 15:01:34 +03:00
  • 3d5672073f Faster iq1_s GEMM via repacking to Q8_0_R8 ik/iq1_s_gemm Iwan Kawrakow 2025-06-11 14:38:48 +03:00
  • c9fd42520e Much faster iq3_xxs GEMM via repacking to q8_0_r8 (AVX2) (#516) Kawrakow 2025-06-11 13:05:26 +03:00
  • 69af3f5990 Much faster iq3_xxs GEMM via repacking to q8_0_r8 (AVX2) (#516) Kawrakow 2025-06-11 13:05:26 +03:00
  • be3b768c9a Much faster iq3_xxs GEMM via repacking to q8_0_r8 (AVX2) ik/iq3_xxs_gemm Iwan Kawrakow 2025-06-11 12:45:59 +03:00
  • bdf1b34493 IQ2_XXS: much faster CPU prompt processing (#515) Kawrakow 2025-06-11 11:12:30 +03:00
  • e56061fa12 IQ2_XXS: much faster CPU prompt processing (#515) Kawrakow 2025-06-11 11:12:30 +03:00
  • 415a7cf6c3 NEON is not working yet, so still use Q8_K GEMM ik/iq2_xxs_gemm Iwan Kawrakow 2025-06-11 10:55:42 +03:00
  • bed683fd09 Cleanup Iwan Kawrakow 2025-06-11 10:36:41 +03:00
  • 6838a3c07e requested changes saood06 2025-06-11 01:17:18 -05:00
  • 9ae91dcf7b more minor fixes Saood Karim 2025-06-10 23:55:31 -05:00
  • 7669d15fcb Add more links and minor fix Saood Karim 2025-06-10 23:48:15 -05:00
  • 8852263074 Update with new ones Saood Karim 2025-06-10 23:42:06 -05:00