Commit Graph

  • a8521b73d7 Removed extra column Kawrakow 2024-06-17 19:19:25 +03:00
  • f97a329638 Removed extra column Iwan Kawrakow 2024-06-17 19:19:25 +03:00
  • 8ca1bdebe4 bitnet 2 bpw: AVX2 implementation Kawrakow 2024-06-17 19:07:38 +03:00
  • 6616985135 bitnet 2 bpw: AVX2 implementation Iwan Kawrakow 2024-06-17 19:07:38 +03:00
  • 318899c8b7 bitnet: add 2 bpw quantization Kawrakow 2024-06-17 18:41:30 +03:00
  • f6863cfa1b bitnet: add 2 bpw quantization Iwan Kawrakow 2024-06-17 18:41:30 +03:00
  • f9ba085ef7 Move Q8_K64 quantization to iqk-quantize.cpp and add copyright notice Kawrakow 2024-06-17 17:00:31 +03:00
  • 765622ff8f Move Q8_K64 quantization to iqk-quantize.cpp and add copyright notice Iwan Kawrakow 2024-06-17 17:00:31 +03:00
  • 0efd620d01 iqk_mul_mat(bitnet): fix typo Kawrakow 2024-06-17 16:50:11 +03:00
  • d82e5db6e5 iqk_mul_mat(bitnet): fix typo Iwan Kawrakow 2024-06-17 16:50:11 +03:00
  • 7b3cb2b96c iqk_mul_mat(bitnet): slightly faster AVX2 Kawrakow 2024-06-17 16:32:25 +03:00
  • ddea72453b iqk_mul_mat(bitnet): slightly faster AVX2 Iwan Kawrakow 2024-06-17 16:32:25 +03:00
  • e6d8441397 iq1_bn: better NEON implementation Kawrakow 2024-06-17 14:16:24 +02:00
  • 30a771bd6b iq1_bn: better NEON implementation Iwan Kawrakow 2024-06-17 14:16:24 +02:00
  • 3686304e03 iq1_bn(NEON): works now, but very slow Kawrakow 2024-06-17 13:04:24 +02:00
  • 8222c9f3d1 iq1_bn(NEON): works now, but very slow Iwan Kawrakow 2024-06-17 13:04:24 +02:00
  • 798697a6ff iq1_bn(Metal): 66.2 -> 67.1 t/s Kawrakow 2024-06-17 12:25:08 +02:00
  • 2f403d4c93 iq1_bn(Metal): 66.2 -> 67.1 t/s Iwan Kawrakow 2024-06-17 12:25:08 +02:00
  • bd266036b6 iq1_bn(Metal): 64 -> 66.2 t/s for TG Kawrakow 2024-06-17 11:51:20 +02:00
  • d42e9e2922 iq1_bn(Metal): 64 -> 66.2 t/s for TG Iwan Kawrakow 2024-06-17 11:51:20 +02:00
  • 7cb77d7a67 iq1_bn(Metal): 64 -> 66.2 t/s for TG Kawrakow 2024-06-17 11:31:56 +02:00
  • 9d58489c33 iq1_bn(Metal): 64 -> 66.2 t/s for TG Iwan Kawrakow 2024-06-17 11:31:56 +02:00
  • 04fed5cd9f iq1_bn(Metal): 60 -> 64 t/s for TG Kawrakow 2024-06-17 11:22:12 +02:00
  • f1d9c42f77 iq1_bn(Metal): 60 -> 64 t/s for TG Iwan Kawrakow 2024-06-17 11:22:12 +02:00
  • 5d14a2243e iq1_bn: very slightly better Metal dot product Kawrakow 2024-06-17 11:10:56 +02:00
  • a35330eb5c iq1_bn: very slightly better Metal dot product Iwan Kawrakow 2024-06-17 11:10:56 +02:00
  • 15e1aec7a5 iq1_bn: Metal now works Kawrakow 2024-06-17 10:48:26 +02:00
  • d9fb92b710 iq1_bn: Metal now works Iwan Kawrakow 2024-06-17 10:48:26 +02:00
  • 4b64224645 iqk_mul_mat(iq1_bn): WIP NEON - don't see why it is not working Kawrakow 2024-06-17 08:05:06 +02:00
  • 0c5a353ebd iqk_mul_mat(iq1_bn): WIP NEON - don't see why it is not working Iwan Kawrakow 2024-06-17 08:05:06 +02:00
  • 77d8637925 iqk_mul_mat(iq1_bn): WIP NEON (not working) Kawrakow 2024-06-17 07:26:33 +02:00
  • bf22b701f4 iqk_mul_mat(iq1_bn): WIP NEON (not working) Iwan Kawrakow 2024-06-17 07:26:33 +02:00
  • dfdc4dbee6 iqk_mul_mat: improve iq1_bn (bitnet) on vanilla AVX2 Kawrakow 2024-06-17 08:24:51 +03:00
  • 29d9bf65f3 iqk_mul_mat: improve iq1_bn (bitnet) on vanilla AVX2 Iwan Kawrakow 2024-06-17 08:24:51 +03:00
  • dff96fb5f8 iqk_mul_mat: improve iq1_bn (bitnet) on AVX2 Kawrakow 2024-06-17 08:09:39 +03:00
  • 91ec824f2d iqk_mul_mat: improve iq1_bn (bitnet) on AVX2 Iwan Kawrakow 2024-06-17 08:09:39 +03:00
  • b0967ffa79 bitnet: fix scalar dot product Kawrakow 2024-06-16 16:55:49 +02:00
  • d1c40ff7e2 bitnet: fix scalar dot product Iwan Kawrakow 2024-06-16 16:55:49 +02:00
  • 88e98260bf bitnet: scale is per row, not per tensor Kawrakow 2024-06-16 17:27:18 +03:00
  • 4fcfcd05d1 bitnet: scale is per row, not per tensor Iwan Kawrakow 2024-06-16 17:27:18 +03:00
  • 077270395b iqk_mul_mat: add iq1_bn (bitnet) Kawrakow 2024-06-16 17:13:06 +03:00
  • 7f8901dca1 iqk_mul_mat: add iq1_bn (bitnet) Iwan Kawrakow 2024-06-16 17:13:06 +03:00
  • eecd48eab5 bitnet: CUDA, scalar, AVX2 Kawrakow 2024-06-16 15:56:32 +03:00
  • 0f53bc30bb bitnet: CUDA, scalar, AVX2 Iwan Kawrakow 2024-06-16 15:56:32 +03:00
  • 81576cdcac bitnet: python + llama Kawrakow 2024-06-16 14:25:12 +03:00
  • f20b28558b bitnet: python + llama Iwan Kawrakow 2024-06-16 14:25:12 +03:00
  • f9490aea46 iqk_mul_mat: cleanup Kawrakow 2024-06-11 15:12:54 +03:00
  • 58756ef03f iqk_mul_mat: cleanup Iwan Kawrakow 2024-06-11 15:12:54 +03:00
  • 389e6220e9 iqk_mul_mat: be independent of llamafile_sgemm Kawrakow 2024-06-11 10:33:51 +03:00
  • 7501184eb4 iqk_mul_mat: be independent of llamafile_sgemm Iwan Kawrakow 2024-06-11 10:33:51 +03:00
  • 915a1b2665 iqk_mul_mat: be independent of llamafile_sgemm (WIP) Kawrakow 2024-06-11 09:12:22 +02:00
  • ad53eabf87 iqk_mul_mat: be independent of llamafile_sgemm (WIP) Iwan Kawrakow 2024-06-11 09:12:22 +02:00
  • cc628b2e39 Fix nb4 Kawrakow 2024-06-10 17:56:55 +02:00
  • 3593891f39 Fix nb4 Iwan Kawrakow 2024-06-10 17:56:55 +02:00
  • d41aef5418 iqk_mul_mat: add ability to disable it Kawrakow 2024-06-10 18:30:33 +03:00
  • 9593e163db iqk_mul_mat: add ability to disable it Iwan Kawrakow 2024-06-10 18:30:33 +03:00
  • 154f56a8de iqk_mul_mat: be able to handle any f16/f32 combination on AVX2 Kawrakow 2024-06-10 16:43:42 +03:00
  • 81cf6990f5 iqk_mul_mat: be able to handle any f16/f32 combination on AVX2 Iwan Kawrakow 2024-06-10 16:43:42 +03:00
  • 1211a4b5d0 iqk_mul_mat: turn on AVX512 Kawrakow 2024-06-10 12:25:27 +03:00
  • b2acd81c75 iqk_mul_mat: turn on AVX512 Iwan Kawrakow 2024-06-10 12:25:27 +03:00
  • dfcb8bebc5 iqk_mul_mat: slightly better fp16 with 16 vector registers Kawrakow 2024-06-10 11:40:32 +03:00
  • 9e3dc8c432 iqk_mul_mat: slightly better fp16 with 16 vector registers Iwan Kawrakow 2024-06-10 11:40:32 +03:00
  • 9dba81ddf2 iqk_mul_mat: better fp16 for AVX2 Kawrakow 2024-06-10 09:53:26 +03:00
  • ae1e77c5de iqk_mul_mat: better fp16 for AVX2 Iwan Kawrakow 2024-06-10 09:53:26 +03:00
  • baf6aaa31b iqk_mul_mat: fp16 for Arm Kawrakow 2024-06-10 08:16:52 +02:00
  • 9386b49918 iqk_mul_mat: fp16 for Arm Iwan Kawrakow 2024-06-10 08:16:52 +02:00
  • 6ec0fcc5c7 iqk_mul_mat: slightly faster FANCY_SIMD dot product Kawrakow 2024-06-09 18:01:52 +03:00
  • 09d86e5876 iqk_mul_mat: slightly faster FANCY_SIMD dot product Iwan Kawrakow 2024-06-09 18:01:52 +03:00
  • 5812618409 iqk_mul_mat: fix q8_0 Kawrakow 2024-06-08 13:47:02 +03:00
  • 8a80a31ddd iqk_mul_mat: fix q8_0 Iwan Kawrakow 2024-06-08 13:47:02 +03:00
  • 7f91151c2e iqk_mul_mat: decouple from llamafile also in cmake Kawrakow 2024-06-08 10:00:19 +03:00
  • 81409a02f3 iqk_mul_mat: decouple from llamafile also in cmake Iwan Kawrakow 2024-06-08 10:00:19 +03:00
  • 8b03121c33 iqk_mul_mat: make it build with the Makefile Kawrakow 2024-06-08 09:55:47 +03:00
  • 8b95156e83 iqk_mul_mat: make it build with the Makefile Iwan Kawrakow 2024-06-08 09:55:47 +03:00
  • c7870afaad iqk_mul_mat: use block_q8_1_x4 also for AVX2 Kawrakow 2024-06-08 09:02:23 +03:00
  • cd3d8ae0e7 iqk_mul_mat: use block_q8_1_x4 also for AVX2 Iwan Kawrakow 2024-06-08 09:02:23 +03:00
  • 5b19e5e4a9 iqk_mul_mat: use block_q8_0_x4 also for AVX2 Kawrakow 2024-06-08 08:20:26 +03:00
  • 299c7f6e89 iqk_mul_mat: use block_q8_0_x4 also for AVX2 Iwan Kawrakow 2024-06-08 08:20:26 +03:00
  • 30a0bf30fa iqk_mul_mat: delete unused stuff Kawrakow 2024-06-07 18:19:02 +03:00
  • f0a52f2fbb iqk_mul_mat: delete unused stuff Iwan Kawrakow 2024-06-07 18:19:02 +03:00
  • 64da6f7a97 iqk_mul_mat: add q8_0 Kawrakow 2024-06-07 17:43:29 +03:00
  • 74b711c8fd iqk_mul_mat: add q8_0 Iwan Kawrakow 2024-06-07 17:43:29 +03:00
  • f2ced256b4 iqk_mul_mat: fp16 tweaks Kawrakow 2024-06-07 15:21:16 +03:00
  • 29164263f4 iqk_mul_mat: fp16 tweaks Iwan Kawrakow 2024-06-07 15:21:16 +03:00
  • b4ecd2dce6 iqk_mul_mat: fp16 implementation cleanup Kawrakow 2024-06-07 14:46:29 +03:00
  • 36c3f57b0a iqk_mul_mat: fp16 implementation cleanup Iwan Kawrakow 2024-06-07 14:46:29 +03:00
  • e0b52e14a6 iqk_mul_mat: fp16 implementation for AVX2 Kawrakow 2024-06-07 14:23:32 +03:00
  • bc659e7de1 iqk_mul_mat: fp16 implementation for AVX2 Iwan Kawrakow 2024-06-07 14:23:32 +03:00
  • 2328da1aa7 iqk_mul_mat: multi-thread quantization also for MoE models Kawrakow 2024-06-07 11:30:17 +03:00
  • 8e072bbba3 iqk_mul_mat: multi-thread quantization also for MoE models Iwan Kawrakow 2024-06-07 11:30:17 +03:00
  • ea239f8572 iqk_mul_mat: make it independent of sgemm Kawrakow 2024-06-07 09:43:33 +03:00
  • 667bd4759c iqk_mul_mat: make it independent of sgemm Iwan Kawrakow 2024-06-07 09:43:33 +03:00
  • 5039ea8930 iqk_mul_mat: minor improvements Kawrakow 2024-06-05 19:43:08 +03:00
  • 2ee56b4f0d iqk_mul_mat: minor improvements Iwan Kawrakow 2024-06-05 19:43:08 +03:00
  • e85753e1ad iqk_mul_mat: no more templates in the IQ dequantizers Kawrakow 2024-06-05 17:01:44 +03:00
  • 0ad646b9f0 iqk_mul_mat: no more templates in the IQ dequantizers Iwan Kawrakow 2024-06-05 17:01:44 +03:00
  • b8556267cd iqk_mul_mat: remove template on one of the prepare() functions Kawrakow 2024-06-05 15:24:37 +03:00
  • e35a14ff5f iqk_mul_mat: remove template on one of the prepare() functions Iwan Kawrakow 2024-06-05 15:24:37 +03:00
  • 44b1b4fb97 iqk_mul_mat: experimenting with zen4 Kawrakow 2024-06-05 12:41:55 +03:00
  • e67626533c iqk_mul_mat: experimenting with zen4 Iwan Kawrakow 2024-06-05 12:41:55 +03:00