Commit Graph

  • 410432f566 Simdify q8_K128 quantization also on Neon Iwan Kawrakow 2025-02-09 08:37:53 +02:00
  • 318ce4344b iq1_m_r4: Use Q8_K_128 instead of Q8_0_X4 for gemm (Neon) Iwan Kawrakow 2025-02-09 08:12:28 +02:00
  • 166a157c7a iq1_s_r4: Use Q8_K_128 instead of Q8_1_X4 for gemm (Neon) Iwan Kawrakow 2025-02-09 06:32:21 +02:00
  • 7cdb0a1ec3 Added final optimizations Saood Karim 2025-02-08 13:25:07 -06:00
  • 57d2702bc1 Added missing gguf-py file Saood Karim 2025-02-08 12:55:20 -06:00
  • f0227c4bfa Avoid allocating MHA KV cache when MLA is turned on Saood Karim 2025-02-08 12:47:52 -06:00
  • b6c4ef9a35 iq1_m_r4: Use Q8_K_128 instead of Q8_1_X4 for gemm (AVX2/Zen4) Iwan Kawrakow 2025-02-08 19:15:25 +02:00
  • 6b4d483dd0 iq1_s_r4: Use Q8_K_128 instead of Q8_1_X4 for gemm (AVX2/Zen4) Iwan Kawrakow 2025-02-08 18:14:00 +02:00
  • 0219e22018 Fixes Saood Karim 2025-02-08 09:02:04 -06:00
  • 2a77cb9893 rpc : refactor server Radoslav Gerganov 2024-10-18 10:13:16 +03:00
  • 0f9bdbb45b rpc : refactor backend Radoslav Gerganov 2024-10-17 10:35:19 +03:00
  • bd7bde0adb Fix merge mistake Saood Karim 2025-02-08 04:17:11 -06:00
  • 3aaf602da5 Remove some unnecessary copies in the MLA attention Iwan Kawrakow 2025-02-08 12:08:17 +02:00
  • a0631d8a9a Apply suggestions from code review matt23654 2025-01-03 22:46:35 +00:00
  • d2dd8b4059 Handle potentially dangerous edge cases. matt23654 2025-01-03 22:14:21 +00:00
  • 07c628f4f2 Cleanup and use GGML error logging functions. matt23654 2025-01-03 14:39:32 +00:00
  • a6e460d6cd fix: remove trailing whitespaces. matt23654 2025-01-02 23:28:32 +00:00
  • 14f2d8e20c Cleaned up and improved type/error handling. matt23654 2025-01-02 15:53:50 +00:00
  • 105b40caf7 Added get_alloc_size forwarding matt23654 2025-01-01 03:37:10 +00:00
  • 823cf1ecab Added init tensor calling code matt23654 2024-12-31 21:56:51 +00:00
  • 37c4fbd7f9 Make MLA optional Iwan Kawrakow 2025-02-06 15:10:06 +02:00
  • 96ec90c317 rpc : prevent crashes on invalid input (#9040) Radoslav Gerganov 2024-08-19 10:10:21 +03:00
  • a7a532d5d1 rpc : prevent crashes on invalid input (#9040) Radoslav Gerganov 2024-08-19 10:10:21 +03:00
  • 35246c4e75 Deepseek MLA Optimizations Saood Karim 2025-01-28 13:45:18 -06:00
  • 716508d196 Revert #79 (#192) Kawrakow 2025-02-08 09:48:59 +02:00
  • 6d7b58eade Revert #79 (#192) Kawrakow 2025-02-08 09:48:59 +02:00
  • df226f38c4 Fixed compilation after revert ik/revert_0bf4d997 Iwan Kawrakow 2025-02-07 11:44:52 +02:00
  • 4daff2fdf2 Revert "Do not quantize activations if not necessary (#79)" Iwan Kawrakow 2025-02-07 11:31:41 +02:00
  • bcf45dd5e0 cuda: non-contiguous rms norm (#190) Kawrakow 2025-02-07 08:33:42 +02:00
  • 4601a8c373 cuda: non-contiguous rms norm (#190) Kawrakow 2025-02-07 08:33:42 +02:00
  • becc417718 Add additional checks for iq1_s_r4 quantization (#191) Kawrakow 2025-02-07 08:33:28 +02:00
  • b08a2e9dfc Add additional checks for iq1_s_r4 quantization (#191) Kawrakow 2025-02-07 08:33:28 +02:00
  • 38f2270a15 Add additional checks for iq1_s_r4 quantization ik/iq1_s_checks Iwan Kawrakow 2025-02-07 08:19:58 +02:00
  • 9ac82537dc cuda: non-contiguous rms norm ik/cuda_rms_non_contiguous Iwan Kawrakow 2025-02-06 19:41:17 +02:00
  • 8049ffcbc8 Rename q4_0_r4, q8_0_r4 and iq4_xs_r4 to _r8 (#189) Kawrakow 2025-02-06 18:45:28 +02:00
  • a08501ee52 Rename q4_0_r4, q8_0_r4 and iq4_xs_r4 to _r8 (#189) Kawrakow 2025-02-06 18:45:28 +02:00
  • 5c37edf98e Rename iq4_xs_r4 to iq4_xs_r8 to reflect actual row interleaving ik/rename_4_8 Iwan Kawrakow 2025-02-06 16:46:44 +02:00
  • 224129c1c3 Rename q8_0_r4 to q8_0_r8 to reflect actual row interleaving Iwan Kawrakow 2025-02-06 16:34:27 +02:00
  • f17bdc1194 Rename q4_0_r4 to q4_0_r8 to reflect actual row interleaving Iwan Kawrakow 2025-02-06 16:20:57 +02:00
  • 7c94c3da56 IQ1_M_R4: better 1.75 bpw quants (#187) Kawrakow 2025-02-06 14:08:52 +02:00
  • 7f61b3068e IQ1_M_R4: better 1.75 bpw quants (#187) Kawrakow 2025-02-06 14:08:52 +02:00
  • 54585d6946 iq1_m_r4: rename mul_mat_iq1_m_r4_q8_1 to mul_mat_iq1_m_r4_q8_0 ik/iq1_m_r4 Iwan Kawrakow 2025-02-06 09:56:18 +02:00
  • dbc30e1d27 iq1_m_r4: switch to q8_0_x4 also on AVX2/Zen4 Iwan Kawrakow 2025-02-06 09:13:39 +02:00
  • c12b21b98a iq1_m_r4: neon gemm Iwan Kawrakow 2025-02-06 09:04:24 +02:00
  • b0ba33bec0 iq1_m_r4: Zen4 gemm Iwan Kawrakow 2025-02-05 19:45:45 +02:00
  • 212e8d192e iq1_m_r4: basics (quantize/dequantize) Iwan Kawrakow 2025-02-05 16:59:55 +02:00
  • 1b64fb3ed5 iq1_s_r4: slightly faster NEON gemm/gemv (#186) Kawrakow 2025-02-05 14:45:51 +02:00
  • a6f9f2ec9a iq1_s_r4: slightly faster NEON gemm/gemv (#186) Kawrakow 2025-02-05 14:45:51 +02:00
  • f3c6937fe5 iq1_s_r4: slightly faster NEON gemm/gemv ik/iq1_s_r4_neon Iwan Kawrakow 2025-02-05 14:22:22 +02:00
  • eb547bad1a IQ1_S_R4: better 1.5 bpw quants (#185) Kawrakow 2025-02-05 13:49:39 +02:00
  • 8b7536bda8 IQ1_S_R4: better 1.5 bpw quants (#185) Kawrakow 2025-02-05 13:49:39 +02:00
  • 3c9b116600 Compiler warnings ik/iq1_s_r4 Iwan Kawrakow 2025-02-05 11:12:00 +02:00
  • 56a6ee26bb iq1_s_r4: slightly faster AVX2/Zen4 gemm/gemv Iwan Kawrakow 2025-02-05 10:23:34 +02:00
  • 0467c16a7f Forgotten counter increment Iwan Kawrakow 2025-02-05 08:14:26 +02:00
  • 19d384302b iq1_s_r4: more bits for shared experts Iwan Kawrakow 2025-02-04 19:49:28 +02:00
  • 25ffa8703a iq1_s_r4: NEON gemm/gemv Iwan Kawrakow 2025-02-04 18:14:54 +02:00
  • 16fbe8e14c iq1_s_r4: fix Zen4 after AVX2 changes Iwan Kawrakow 2025-02-04 17:21:29 +02:00
  • b9edce5797 iq1_s_r4: this is better Iwan Kawrakow 2025-02-04 17:06:43 +02:00
  • 83f02e25fc Don't forget to make sure we have a multiple of 4 rows per thread Iwan Kawrakow 2025-02-04 15:25:08 +02:00
  • 6e31b493b3 iq1_s_r4: gemm/gemv works on AVX2/Zen4 Iwan Kawrakow 2025-02-04 14:56:25 +02:00
  • db761f4cec iq1_s_r4: basics - quantize/dequantize Iwan Kawrakow 2025-02-04 11:37:09 +02:00
  • ba470ec1b4 Deepseek-Lite (#184) Kawrakow 2025-01-30 18:36:24 +02:00
  • ecf111a11c Deepseek-Lite (#184) Kawrakow 2025-01-30 18:36:24 +02:00
  • b8966277c0 Make q5,6_0_r4, iq4_nl_e4 work with row size that are not a multiple of 128 ik/qmix_tweaks_2 Iwan Kawrakow 2025-01-30 18:29:04 +02:00
  • 2ed550e557 Make q5_0_r4 work with row size that are not a multiple of 128 Iwan Kawrakow 2025-01-30 17:49:52 +02:00
  • 2c3ad4f593 Make q6_0_w4 work with row size that are not a multiple of 128 Iwan Kawrakow 2025-01-30 17:18:57 +02:00
  • 5e381145f0 Make q6_0_w4 work with row size that are not a multiple of 128 Iwan Kawrakow 2025-01-30 16:45:49 +02:00
  • 42fcf4512e Make iq4_nl_r4 work with row size that are not a multiple of 128 Iwan Kawrakow 2025-01-30 16:09:05 +02:00
  • 5a279b37ba Make iq4_nl_r4 work with row size that are not a multiple of 128 Iwan Kawrakow 2025-01-30 15:29:22 +02:00
  • 4ecba36ebf Make iq4_nl_r4 work with row size that are not a multiple of 128 Iwan Kawrakow 2025-01-30 12:57:42 +02:00
  • cc39e3a90c Quantization mixes tweaks Iwan Kawrakow 2025-01-30 11:49:45 +02:00
  • f7a4a0fd42 Faster Q4_K_R4 and Q5_K_R4 on AVX2/Zen4 (#182) Kawrakow 2025-01-30 09:28:53 +02:00
  • 2e6b523853 Faster Q4_K_R4 and Q5_K_R4 on AVX2/Zen4 (#182) Kawrakow 2025-01-30 09:28:53 +02:00
  • 195d7efc8e Cleanup ik/qx_k_b32_avx2 Iwan Kawrakow 2025-01-30 09:24:52 +02:00
  • c7841bbfe6 Minor tweak Iwan Kawrakow 2025-01-30 09:03:14 +02:00
  • 6136d8ebbd Use AVX2 implementation of q4_k_r4 and q5_k_r4 also on Zen4 Iwan Kawrakow 2025-01-29 19:43:52 +02:00
  • 98bbe5d2e5 Faster AVX2 implementation for q5_k_q4 Iwan Kawrakow 2025-01-29 17:33:37 +02:00
  • d07ba6606e Fix llama-bench labels that I broke with #181 Iwan Kawrakow 2025-01-29 16:58:11 +02:00
  • dbe3b5837a Even better AVX2 implementation for q4_k_r4 Iwan Kawrakow 2025-01-29 16:42:07 +02:00
  • 118baf3f73 Slightly faster AVX2 implementation for q4_k_r4 Iwan Kawrakow 2025-01-29 15:32:46 +02:00
  • 5bbe93c0c4 Various (#181) Kawrakow 2025-01-29 14:05:41 +02:00
  • 4a73c25002 Various (#181) Kawrakow 2025-01-29 14:05:41 +02:00
  • 23e90dc325 Make q4_0_r4 work with tensor row sizes that are not a multiple of 128 ik/bench_gp Iwan Kawrakow 2025-01-29 09:55:10 +02:00
  • 80ef71307f Make q4_0_r4 work with tensor row sizes that are not a multiple of 128 Iwan Kawrakow 2025-01-29 08:29:23 +02:00
  • 3b46d3afd5 Make q4_0_r4 work with tensor row sizes that are not a multiple of 128 Iwan Kawrakow 2025-01-29 08:19:08 +02:00
  • 4d7dc72d41 Make q8_0_r4 work with tensor row sizes that are not a multiple of 128 Iwan Kawrakow 2025-01-28 19:59:29 +02:00
  • d3545680b9 Make q8_0_r4 work with tensor row sizes that are not a multiple of 128 Iwan Kawrakow 2025-01-28 19:26:59 +02:00
  • 3c974f5076 Make q8_0_r4 work with tensor row sizes that are not a multiple of 128 Iwan Kawrakow 2025-01-28 17:22:49 +02:00
  • 45de9c82c4 Adding gp option to llama-bench Iwan Kawrakow 2025-01-28 11:56:10 +02:00
  • d5b205970f Minor performance improvements (#179) Kawrakow 2025-01-27 18:53:47 +02:00
  • f725576345 Minor performance improvements (#179) Kawrakow 2025-01-27 18:53:47 +02:00
  • b22ed8bc66 Be able to load Deepseek-v2-Lite ik/q4_0_r8 Iwan Kawrakow 2025-01-27 17:47:24 +02:00
  • 2d34c55b6f Merge remote-tracking branch 'origin/main' into ik/q4_0_r8 Iwan Kawrakow 2025-01-27 16:57:34 +02:00
  • f5a09ac6c3 Interleave 8 rows (Q8_0, IQ4_XS) (#178) Kawrakow 2025-01-27 16:50:07 +02:00
  • d9c4ea48d1 Interleave 8 rows (Q8_0, IQ4_XS) (#178) Kawrakow 2025-01-27 16:50:07 +02:00
  • fac48faa21 Process up to 16 columns per kernel call for q8_k_r8 Iwan Kawrakow 2025-01-27 12:39:56 +02:00
  • f1c114d477 Apply platform specific modifications when repacking Iwan Kawrakow 2025-01-27 11:59:30 +02:00
  • 8b3c66063f Apply platform specific modifications when repacking Iwan Kawrakow 2025-01-27 11:12:18 +02:00
  • ee8f966202 q4_0_r8 (Zen4) - slightly better Iwan Kawrakow 2025-01-27 09:19:13 +02:00
  • 17d6c431a3 q4_0_r8 (Zen4) Iwan Kawrakow 2025-01-27 08:08:18 +02:00