Commit Graph

  • 45bd3ea340 iq3_k: NEON Iwan Kawrakow 2024-07-30 19:17:12 +02:00
  • bfef3dc584 iq3_k: AVX2 iqk_mul_mat Iwan Kawrakow 2024-07-30 19:01:35 +03:00
  • bfe625f6ea iq3_k: AVX512 iqk_mul_mat Iwan Kawrakow 2024-07-30 18:40:10 +03:00
  • 366441bd75 iq3_k: faster CUDA dot product Iwan Kawrakow 2024-07-30 17:18:31 +03:00
  • 8d2b5bae2b iq3_k: CUDA dot product Iwan Kawrakow 2024-07-30 16:57:39 +03:00
  • c828bd3c57 iq3_k: Basics Iwan Kawrakow 2024-07-30 16:11:25 +03:00
  • 92b3e1c034 iq2_k: very slightly better CUDA dot product Iwan Kawrakow 2024-07-30 15:17:21 +03:00
  • 79cd04b2a6 iq2_k: better CUDA dot product Iwan Kawrakow 2024-07-30 12:49:55 +03:00
  • 534d3bbc48 iq2_k: CUDA dot product finally works Iwan Kawrakow 2024-07-30 12:33:48 +03:00
  • a26688841f iq5_k: CUDA dot product finally works Iwan Kawrakow 2024-07-30 11:22:08 +03:00
  • 262fe6d9d2 Factor out iqk CUDA dot products Iwan Kawrakow 2024-07-30 09:14:18 +03:00
  • 3806ee285f iq5_k: CUDA dot product still not working Iwan Kawrakow 2024-07-29 21:01:05 +03:00
  • d2b51308d3 iq5_k: Metal Iwan Kawrakow 2024-07-29 16:47:06 +02:00
  • cc0493f1ff iq5_k: NEON Iwan Kawrakow 2024-07-29 13:57:14 +02:00
  • eb5bd49f10 iq5_k: AVX512 Iwan Kawrakow 2024-07-29 14:26:14 +03:00
  • 1704a34e6c iq5_k: AVX2 Iwan Kawrakow 2024-07-29 13:39:50 +03:00
  • f0836bdbbe iq5_k: Basics Iwan Kawrakow 2024-07-29 12:38:46 +03:00
  • 1c2d026da6 iq2_k: Metal. Dot product is wrong Iwan Kawrakow 2024-07-29 09:49:32 +02:00
  • 89b410dfb7 iq2_k: NEON Iwan Kawrakow 2024-07-29 07:26:36 +02:00
  • 972f134e88 iq2_k: slightly faster AVX512 Iwan Kawrakow 2024-07-29 06:51:44 +03:00
  • d07c58b4b7 iq2_k: simplify AVX512 Iwan Kawrakow 2024-07-28 21:05:56 +03:00
  • 3555a3d8ba iq2_k: AVX2 Iwan Kawrakow 2024-07-28 20:50:21 +03:00
  • 76449533f2 iq2_k: Basics Iwan Kawrakow 2024-07-28 19:43:18 +03:00
  • 007d2a56b3 IQ4_K: SOTA 4-bit quantization (#6) Kawrakow 2024-07-28 12:11:59 +02:00
  • 291066e6df IQ4_K: SOTA 4-bit quantization (#6) Kawrakow 2024-07-28 12:11:59 +02:00
  • b29f64ea70 iq4_k: scalar dot product ik/iq4_k Iwan Kawrakow 2024-07-28 12:09:28 +02:00
  • 8ffb6452d4 iq4_k: Metal implementation Iwan Kawrakow 2024-07-28 10:04:41 +02:00
  • d89c88e8df iq4_k: NEON implementation Iwan Kawrakow 2024-07-28 08:36:20 +02:00
  • db87f766e8 iq4_k: AVX2 implementation Iwan Kawrakow 2024-07-27 21:10:22 +03:00
  • be34f768db iq4_k: AVX512 implementation Iwan Kawrakow 2024-07-27 20:13:30 +03:00
  • 41d20f6bb5 iq4_k: TG now works on CUDA Iwan Kawrakow 2024-07-27 18:02:59 +03:00
  • 8a2d43813d iq4_k: basics Iwan Kawrakow 2024-07-27 17:05:31 +03:00
  • 473e280500 Fusing a mat mul op followed by scale op on the CPU ik/fuse_mul_mat_scale Iwan Kawrakow 2024-07-27 10:45:56 +03:00
  • 8963f383c0 Simdify and multi-thread tanh (#4) Kawrakow 2024-07-27 08:44:18 +02:00
  • f62615b44f Simdify and multi-thread tanh (#4) Kawrakow 2024-07-27 08:44:18 +02:00
  • 0ceeb11721 Merge mainline llama.cpp (#3) Kawrakow 2024-07-27 07:55:01 +02:00
  • 154e0d75fc Merge mainline llama.cpp (#3) Kawrakow 2024-07-27 07:55:01 +02:00
  • 573e5007cd Remove check ik/merge_July_26_2024 Iwan Kawrakow 2024-07-26 18:00:26 +03:00
  • ddd97dccc8 Merging mainline - fix Metal Iwan Kawrakow 2024-07-26 16:41:16 +02:00
  • a0849e49f9 Merging mainline - WIP Iwan Kawrakow 2024-07-26 17:16:21 +03:00
  • 6b2b52d2fe Merging mainline - WIP Iwan Kawrakow 2024-07-26 16:32:40 +03:00
  • afd9fd274e Offload Bitnet token embeddings to the GPU - the right way (#2) Kawrakow 2024-07-26 12:57:23 +02:00
  • 0684c3e9c7 Offload Bitnet token embeddings to the GPU - the right way (#2) Kawrakow 2024-07-26 12:57:23 +02:00
  • ccdb948329 Offload Bitnet token embeddings to the GPU - the right way ik/bitnet_token_embedding_gpu_2 Iwan Kawrakow 2024-07-26 13:50:41 +03:00
  • a14a9426ec Offload Bitnet token embeddings to the GPU (#1) Kawrakow 2024-07-26 09:41:04 +02:00
  • 94b5916319 Offload Bitnet token embeddings to the GPU (#1) Kawrakow 2024-07-26 09:41:04 +02:00
  • db6b0f6dab Update README with the new CUDA/Meat performance ik/bitnet_token_embedding_gpu Iwan Kawrakow 2024-07-26 09:06:22 +02:00
  • fbafe0989f bitnet: put token embeddings on the GPU Iwan Kawrakow 2024-07-26 09:50:52 +03:00
  • 4673de8cbe iqk_mul_mat(NEON): adding forgotten fp16 matrix x vector implementation Kawrakow 2024-07-25 08:37:13 +02:00
  • c2158c15d9 iqk_mul_mat(NEON): adding forgotten fp16 matrix x vector implementation Iwan Kawrakow 2024-07-25 08:37:13 +02:00
  • 5626b09e4b Update README.md Kawrakow 2024-07-24 19:55:06 +02:00
  • 28fb349db4 Update README.md Kawrakow 2024-07-24 19:55:06 +02:00
  • ddaae42194 Update README.md Kawrakow 2024-07-24 19:44:52 +02:00
  • eb246cd0ae Update README.md Kawrakow 2024-07-24 19:44:52 +02:00
  • 914b7ef460 Update README.md Kawrakow 2024-07-24 19:20:46 +02:00
  • fc07ca7847 Update README.md Kawrakow 2024-07-24 19:20:46 +02:00
  • 010466af1e Add copyright notices Kawrakow 2024-07-24 20:11:42 +03:00
  • 770f3585c2 Add copyright notices Iwan Kawrakow 2024-07-24 20:11:42 +03:00
  • e0b2dd511c Remove unused file Kawrakow 2024-07-24 19:33:19 +03:00
  • 9eee03f4ee Remove unused file Iwan Kawrakow 2024-07-24 19:33:19 +03:00
  • 6fd0a92cb0 Remove security Kawrakow 2024-07-24 19:25:21 +03:00
  • 3d83f58654 Remove security Iwan Kawrakow 2024-07-24 19:25:21 +03:00
  • 28b4229295 Correct spelling in README Kawrakow 2024-07-24 19:22:43 +03:00
  • b64275ca4e Correct spelling in README Iwan Kawrakow 2024-07-24 19:22:43 +03:00
  • b84d0c1744 Update README.md Kawrakow 2024-07-24 17:38:37 +02:00
  • 4192244242 Update README.md Kawrakow 2024-07-24 17:38:37 +02:00
  • de43999de5 Update README.md Kawrakow 2024-07-24 16:49:00 +02:00
  • 47c1243e3c Update README.md Kawrakow 2024-07-24 16:49:00 +02:00
  • cd77618324 Update README.md Kawrakow 2024-07-24 11:18:50 +02:00
  • 8fe7e04456 Update README.md Kawrakow 2024-07-24 11:18:50 +02:00
  • 4bb58ea8f8 Update README.md Kawrakow 2024-07-24 11:01:16 +02:00
  • a5c39e9476 Update README.md Kawrakow 2024-07-24 11:01:16 +02:00
  • 73b94e5c3f iqk_mul_mat(NEON): special case for n not divisible by 8 Kawrakow 2024-07-24 08:02:56 +02:00
  • 6b4167164c iqk_mul_mat(NEON): special case for n not divisible by 8 Iwan Kawrakow 2024-07-24 08:02:56 +02:00
  • 5992d2652b ggml: thread syncronization on Arm Kawrakow 2024-07-24 07:57:47 +02:00
  • 2e49f0172f ggml: thread syncronization on Arm Iwan Kawrakow 2024-07-24 07:57:47 +02:00
  • 005674cecc Fix "make it work for row sizes that are multiple of 4 on NEON" Kawrakow 2024-07-22 12:28:18 +02:00
  • abb740c9a4 Fix "make it work for row sizes that are multiple of 4 on NEON" Iwan Kawrakow 2024-07-22 12:28:18 +02:00
  • 847588cc92 Update README.md Kawrakow 2024-07-23 18:05:05 +02:00
  • 0117e386b3 Update README.md Kawrakow 2024-07-23 18:05:05 +02:00
  • 97680f602c Update README.md Kawrakow 2024-07-23 12:23:06 +02:00
  • 11e2472c64 Update README.md Kawrakow 2024-07-23 12:23:06 +02:00
  • 86d94862ae iqk_soft_max ik/mul_mat_ext Iwan Kawrakow 2024-07-22 16:34:42 +02:00
  • 412bc31c75 Extended mul mat: C = alpha * A * B + beta Iwan Kawrakow 2024-07-22 09:26:55 +03:00
  • 8bf126c1d6 When tokenizer info is missing in the model, use llama3 by default Kawrakow 2024-07-19 12:29:01 +03:00
  • 99119ec29c When tokenizer info is missing in the model, use llama3 by default Iwan Kawrakow 2024-07-19 12:29:01 +03:00
  • 6a94ca46ad iqk_mul_mat(f16): make it work for row sizes that are multiple of 4 on NEON Kawrakow 2024-07-18 13:55:51 +02:00
  • 30b8bcf1a3 iqk_mul_mat(f16): make it work for row sizes that are multiple of 4 on NEON Iwan Kawrakow 2024-07-18 13:55:51 +02:00
  • 4d1e83f8b8 iqk_mul_mat: attentions matrix multiplications Kawrakow 2024-07-18 14:00:56 +03:00
  • 8db01c0804 iqk_mul_mat: attentions matrix multiplications Iwan Kawrakow 2024-07-18 14:00:56 +03:00
  • c14a6a6862 iqk_mul_mat(float): make it work for row sizes that are multiple of 4 on AVX2 Kawrakow 2024-07-18 11:39:32 +03:00
  • 744eb9ffa9 iqk_mul_mat(float): make it work for row sizes that are multiple of 4 on AVX2 Iwan Kawrakow 2024-07-18 11:39:32 +03:00
  • d556b1d809 Fix Makefile, add GGML_USE_IQK_MULMAT ifdefs to iqk-quantize Kawrakow 2024-07-17 16:51:34 +03:00
  • 6a132862fd Fix Makefile, add GGML_USE_IQK_MULMAT ifdefs to iqk-quantize Iwan Kawrakow 2024-07-17 16:51:34 +03:00
  • 6f0805a3c7 iq1bn: faster scalar dot product Kawrakow 2024-07-17 16:09:01 +03:00
  • a4017cc047 iq1bn: faster scalar dot product Iwan Kawrakow 2024-07-17 16:09:01 +03:00
  • 02dc036187 iq1bn: fix scalar dot product Kawrakow 2024-07-17 13:37:18 +03:00
  • a0df4002fc iq1bn: fix scalar dot product Iwan Kawrakow 2024-07-17 13:37:18 +03:00
  • 04decf3fc5 iq1bn: faster AVX2 Kawrakow 2024-07-17 10:17:05 +03:00
  • 7024ecfeb4 iq1bn: faster AVX2 ik/new_iq1bn Iwan Kawrakow 2024-07-17 10:17:05 +03:00