Commit Graph

  • c2d708ae10 Fix JSONL output s6/sweep_bench Saood Karim 2025-02-22 22:30:12 -06:00
  • 55d33a5a91 Fix compilation error with IQK_FA_ALL_QUANTS enabled ik/issue_224 Iwan Kawrakow 2025-02-23 06:12:15 +02:00
  • dfaf65109a Made it compile with ik_llama Saood Karim 2025-02-22 21:50:50 -06:00
  • e7c8b0df6c Change documentation to reference ik_llama.cpp Saood Karim 2025-02-22 20:14:26 -06:00
  • 6bbe43b26f examples : add new sweep-bench benchmark Stanisław Szymczyk 2025-02-06 21:36:31 +01:00
  • 299700a4ec Fix #217 (#220) Kawrakow 2025-02-22 14:25:38 +02:00
  • 4926105844 Fix #217 (#220) Kawrakow 2025-02-22 14:25:38 +02:00
  • 35dc73c828 Remove stuff commited by mistake ik/issue_217 Iwan Kawrakow 2025-02-22 14:24:36 +02:00
  • d555e4199d Fix #217 Iwan Kawrakow 2025-02-22 14:20:47 +02:00
  • a989566f7a Fuse MoE up and gate matrix multiplications (#219) Kawrakow 2025-02-22 09:41:40 +02:00
  • 33646fc409 Fuse MoE up and gate matrix multiplications (#219) Kawrakow 2025-02-22 09:41:40 +02:00
  • a6a2c89d19 Merge remote-tracking branch 'origin/main' into ik/fuse_moe_up_gate ik/fuse_moe_up_gate Iwan Kawrakow 2025-02-22 09:39:46 +02:00
  • dcff697474 Better strategy for attention matrix multiplications when generating tokens (#218) Kawrakow 2025-02-22 09:38:51 +02:00
  • c4a5103299 Better strategy for attention matrix multiplications when generating tokens (#218) Kawrakow 2025-02-22 09:38:51 +02:00
  • 216ea5890d Fuse up and gate gemms in MoE models Iwan Kawrakow 2025-02-22 08:26:26 +02:00
  • af790bb5fa Cleanup ik/attn_gemm Iwan Kawrakow 2025-02-22 06:09:54 +02:00
  • 85aa6df69c This seems to be a better way Iwan Kawrakow 2025-02-21 16:37:53 +02:00
  • 17d43879c6 Hopefully this really fixes the confusion between AVX512 and FANCY_SIMD (#216) Kawrakow 2025-02-21 15:33:25 +02:00
  • b9a6639ac3 Hopefully this really fixes the confusion between AVX512 and FANCY_SIMD (#216) Kawrakow 2025-02-21 15:33:25 +02:00
  • ffe65511cf Hopefully this really fixes the confusion between AVX512 and FANCY_SIMD ik/fix_avx512_vs_fancy_simd Iwan Kawrakow 2025-02-21 12:30:33 +02:00
  • 9ccc810b08 Trying to fix confusion betweem HAVE_FANCY_SIMD and AVX512 ik/issue_214 Iwan Kawrakow 2025-02-21 09:06:57 +02:00
  • 5a196198b6 Honor attn_output specified in the command line also for low-bit quants Kawrakow 2025-02-20 17:42:07 +02:00
  • 4b45b82e67 Honor attn_output specified in the command line also for low-bit quants Iwan Kawrakow 2025-02-20 17:42:07 +02:00
  • 46f23397d1 Fix NEON gemm/gemv for legacy quants when row size is not divisible by 128 (#213) Kawrakow 2025-02-20 13:55:13 +02:00
  • a45da7bfbf Fix NEON gemm/gemv for legacy quants when row size is not divisible by 128 (#213) Kawrakow 2025-02-20 13:55:13 +02:00
  • 75e11382a6 Fix typo ik/fix_neon_legacy_quants Iwan Kawrakow 2025-02-20 13:54:28 +02:00
  • 667814da46 Fix gemm/gemv for legacy quants when row size is not divisible by 128 Iwan Kawrakow 2025-02-20 13:44:33 +02:00
  • 5fc4676522 Optimized GEMM/GEMV for IQ1_S (#212) Kawrakow 2025-02-20 12:41:45 +02:00
  • 498a582919 Optimized GEMM/GEMV for IQ1_S (#212) Kawrakow 2025-02-20 12:41:45 +02:00
  • 74c26d05c2 iq1s: NEON ik/gemm_iq1s Iwan Kawrakow 2025-02-20 11:33:36 +02:00
  • a5be8c8626 iq1_s: AVX2 Iwan Kawrakow 2025-02-20 08:46:41 +02:00
  • 189a9a6ba0 iq1_s: slightly better on Zen4 Iwan Kawrakow 2025-02-20 07:39:42 +02:00
  • 8bf80a09e2 Adding iq1_s to iqk_mul_mat (Zen4) Iwan Kawrakow 2025-02-19 20:04:53 +02:00
  • 1140b4568d Q8_KV: 8-bit quantization type targeting the KV cache (#208) Kawrakow 2025-02-19 11:47:07 +02:00
  • a0ebfdd661 Q8_KV: 8-bit quantization type targeting the KV cache (#208) Kawrakow 2025-02-19 11:47:07 +02:00
  • 4502eab09e Minor ik/q8_KV Iwan Kawrakow 2025-02-19 10:42:15 +02:00
  • 9236c82244 q8_KV: nrc_y = 16 also doesn't pay off in FA Iwan Kawrakow 2025-02-18 18:15:23 +02:00
  • e08e292bea q8_KV_r8: don't use nrc_y = 16 on Zen4 Iwan Kawrakow 2025-02-18 16:39:45 +02:00
  • 2b9526c8b6 q8_KV_r8 - repacked q8_KV Iwan Kawrakow 2025-02-18 16:25:25 +02:00
  • c82f4194c3 q8_KV: use it in FA on NEON Iwan Kawrakow 2025-02-18 13:53:01 +02:00
  • 58c13d0574 q8_KV: ARM_NEON Iwan Kawrakow 2025-02-18 13:18:48 +02:00
  • 10168ab532 q8_KV: slightly faster gemv on Zen4 Iwan Kawrakow 2025-02-18 11:06:57 +02:00
  • 1ecea16f63 q8_KV: slightly faster gemv on Zen4 Iwan Kawrakow 2025-02-18 10:40:50 +02:00
  • 7f4ec2f964 q8_KV: repack it for K*Q in FA Iwan Kawrakow 2025-02-17 19:03:49 +02:00
  • 8f004a0759 q8_KV: be able to use it for K cache in FA Iwan Kawrakow 2025-02-17 17:10:28 +02:00
  • 0280b8d52b q8_KV: be able to use it for K cache Iwan Kawrakow 2025-02-17 15:19:05 +02:00
  • a4ffe2e69e q8_KV: AVX2 gemm/gemv Iwan Kawrakow 2025-02-17 12:50:44 +02:00
  • 0d7885f081 q8_KV: Better Zen4 gemm Iwan Kawrakow 2025-02-17 12:16:55 +02:00
  • 7979f85142 q8_KV: Better AVX2 gemm Iwan Kawrakow 2025-02-17 11:49:08 +02:00
  • 538876471b Adding q8_KV - Basics + AVX2 gemm/gemv Iwan Kawrakow 2025-02-17 10:55:47 +02:00
  • 9c74d3ef12 Repack also experts (#210) Kawrakow 2025-02-19 10:01:49 +02:00
  • 047ba895bb Repack also experts (#210) Kawrakow 2025-02-19 10:01:49 +02:00
  • 7d020d8681 Repack also experts ik/repack_also_experts Iwan Kawrakow 2025-02-19 09:54:48 +02:00
  • 6b809ca0e1 Bug fix in activation quantization Kawrakow 2025-02-15 19:50:53 +02:00
  • d44aba79ea Bug fix in activation quantization Iwan Kawrakow 2025-02-15 19:50:53 +02:00
  • 149d0d5768 Moving 4D gemm logic from ggml.c to iqk_mul_mat.cpp (#207) Kawrakow 2025-02-15 08:45:45 +02:00
  • 0551e7630b Moving 4D gemm logic from ggml.c to iqk_mul_mat.cpp (#207) Kawrakow 2025-02-15 08:45:45 +02:00
  • bdc882bfac Moving 4D gemm logic from ggml.c to iqk_mul_mat.cpp ik/gemm_4d Iwan Kawrakow 2025-02-14 18:03:19 +02:00
  • 51e13ee97a MLA: allow Q8_0 K-cache for MLA (#206) Kawrakow 2025-02-13 14:44:33 +02:00
  • 8e94b29e35 MLA: allow Q8_0 K-cache for MLA (#206) Kawrakow 2025-02-13 14:44:33 +02:00
  • a0ee859784 MLA: allow Q8_0 K-cache for MLA ik/mla_q80 Iwan Kawrakow 2025-02-13 12:44:05 +02:00
  • cbe2bca1e6 Faster MLA prompt processing (#205) Kawrakow 2025-02-13 11:50:20 +02:00
  • 05242ff17d Faster MLA prompt processing (#205) Kawrakow 2025-02-13 11:50:20 +02:00
  • f875ed00e8 MLA: compile time option to not use transposed KV cache ik/mla_fixes Iwan Kawrakow 2025-02-13 10:54:42 +02:00
  • 91db234fb5 Warn user when disabling MLA Iwan Kawrakow 2025-02-13 08:40:24 +02:00
  • 00063b7d99 WIP Iwan Kawrakow 2025-02-12 18:38:08 +02:00
  • 00dcb0cfa7 WIP Iwan Kawrakow 2025-02-12 15:29:59 +02:00
  • 10c31d3feb Fix iqk_mul_mat on AVX512 systems that are missing BF16 support (#204) Kawrakow 2025-02-12 14:22:26 +02:00
  • 1bbb543478 Fix iqk_mul_mat on AVX512 systems that are missing BF16 support (#204) Kawrakow 2025-02-12 14:22:26 +02:00
  • 8861e7a4ef One more ik/fix_missing_bf16_avx512 Iwan Kawrakow 2025-02-12 13:56:54 +02:00
  • cfee1a0b91 WIP Iwan Kawrakow 2025-02-12 13:53:38 +02:00
  • 86808719c4 Fix iqk_mul_mat on AVX512 systems that are missing BF16 support Iwan Kawrakow 2025-02-12 10:15:22 +02:00
  • 8438b16281 WIP Iwan Kawrakow 2025-02-12 10:04:33 +02:00
  • 54252d0256 Rename X_pe to X_rope Iwan Kawrakow 2025-02-12 07:49:16 +02:00
  • 978aaa9f68 Do not allocate / report caches that are not used Iwan Kawrakow 2025-02-12 07:41:53 +02:00
  • 1ac5fd1aed Fix imatrix overprotectiveness (#202) Kawrakow 2025-02-12 07:20:38 +02:00
  • e974fc9e66 Fix imatrix overprotectiveness (#202) Kawrakow 2025-02-12 07:20:38 +02:00
  • 1044af2d95 Fix imatrix overprotectiveness ik/fix_imatrix_nonsense Iwan Kawrakow 2025-02-11 18:05:57 +02:00
  • e57440472e DeepSeek FA support (CPU only) (#200) Kawrakow 2025-02-11 14:46:30 +02:00
  • 3c98bfb33d DeepSeek FA support (CPU only) (#200) Kawrakow 2025-02-11 14:46:30 +02:00
  • 4066235b8f FA: very slightly faster for nq = 1 (TG) ik/fattn_Dk_Dv Iwan Kawrakow 2025-02-10 18:25:44 +02:00
  • 10815e7ebe iqk support for K head size != V head size Iwan Kawrakow 2025-02-10 16:00:25 +02:00
  • a00bb54c61 Adding support for K head size != V head size Iwan Kawrakow 2025-02-10 15:14:03 +02:00
  • a2676d5904 Load all MoE experts during warmup and make warmup 1 token (#198) saood06 2025-02-10 09:40:38 -06:00
  • a366a3d17d Load all MoE experts during warmup and make warmup 1 token (#198) saood06 2025-02-10 09:40:38 -06:00
  • 370274317b Unify warmup to one token s6/warmup Saood Karim 2025-02-09 16:05:16 -06:00
  • ca4e8e5346 Load all MoE experts during warmup Saood Karim 2025-02-09 15:32:38 -06:00
  • c13027bcaf Merge remote-tracking branch 'origin/main' into ik/try_trellis ik/try_trellis Iwan Kawrakow 2025-02-09 20:00:41 +02:00
  • 3e536b95b0 Add optional MLA (#188) Kawrakow 2025-02-09 19:48:44 +02:00
  • c12f73ba61 Add optional MLA (#188) Kawrakow 2025-02-09 19:48:44 +02:00
  • db7eabb111 FA: Add option to build all FA kernels (#197) Kawrakow 2025-02-09 18:59:33 +02:00
  • cae2b81155 FA: Add option to build all FA kernels (#197) Kawrakow 2025-02-09 18:59:33 +02:00
  • 01e2b0c2ce FA: Add option to build all FA kernels ik/iqk_fattn_all_quants Iwan Kawrakow 2025-02-09 18:50:50 +02:00
  • c6cfbc79a9 Better gemm strategy when nth > nhead ik/mla Iwan Kawrakow 2025-02-09 13:03:18 +02:00
  • 17e810142f Use type_k and type_v to set the types of the MLA caches Iwan Kawrakow 2025-02-09 11:13:13 +02:00
  • d58dee869a Deepseek MLA Optimizations V2 (#195) saood06 2025-02-09 01:36:54 -06:00
  • bf1d056125 Make sure we do have wk_b and wv_b before enabling MLA s6/mla Iwan Kawrakow 2025-02-09 09:24:52 +02:00
  • 6658922b94 Use Q8_K_128 for IQ1_S_R4 and IQ1_M_R4 matrix multiplications (#194) Kawrakow 2025-02-09 09:14:52 +02:00
  • 33390c4b74 Use Q8_K_128 for IQ1_S_R4 and IQ1_M_R4 matrix multiplications (#194) Kawrakow 2025-02-09 09:14:52 +02:00
  • 03dc7bd787 Cleanup ik/iq1_s_r4_k128 Iwan Kawrakow 2025-02-09 09:14:12 +02:00