Commit Graph

  • 59c2e77869 Zen4 Flash Attention (#32) Kawrakow 2024-09-01 16:08:21 +03:00
  • dc023bc3be Zen4 Flash Attention (#32) Kawrakow 2024-09-01 16:08:21 +03:00
  • a66d1fc562 Update FlashAttn comment ik/zen4_flash_attn Iwan Kawrakow 2024-09-01 12:47:50 +03:00
  • 94439ea73c Flass attention refinements Iwan Kawrakow 2024-09-01 11:16:07 +03:00
  • 8694617908 Add flash attention with soft-cap and fix D = 256 case Iwan Kawrakow 2024-08-31 19:03:40 +03:00
  • 412c6a98af Zen4 flash attention: moving useful parts from the kq_fused_softmax branch Iwan Kawrakow 2024-08-31 17:05:50 +03:00
  • 1b834ac6e4 Flash attention: templated implementation ik/kq_fused_softmax Iwan Kawrakow 2024-08-31 13:10:36 +03:00
  • 5ff997021e Fix build when iqk_mul_mat is disabled (#31) Kawrakow 2024-08-31 09:11:42 +03:00
  • dbb1db9899 Fix build when iqk_mul_mat is disabled (#31) Kawrakow 2024-08-31 09:11:42 +03:00
  • eb046ae5f3 Fix build when iqk_mul_mat is disabled ik/fix_no_iqk_build Iwan Kawrakow 2024-08-31 08:52:08 +03:00
  • 6d9510c680 WIP Iwan Kawrakow 2024-08-31 07:15:38 +03:00
  • 77b7baaff7 WIP Iwan Kawrakow 2024-08-30 17:55:36 +03:00
  • 71f5b941bf Experimenting with flash attention on Zen4 Iwan Kawrakow 2024-08-30 15:37:52 +03:00
  • 30b89f0cf9 Experimenting with flash attention on Zen4 Iwan Kawrakow 2024-08-30 09:21:37 +03:00
  • e4959f9e46 Experimenting with flash attention on Zen4 Iwan Kawrakow 2024-08-30 08:42:34 +03:00
  • 92adf7e6df Experimenting with flash attention on Zen4 Iwan Kawrakow 2024-08-30 07:38:54 +03:00
  • b5df88b120 Experimenting with flash attention on Zen4 Iwan Kawrakow 2024-08-29 17:52:56 +03:00
  • a02e78d5c1 WIP Iwan Kawrakow 2024-08-29 10:06:29 +03:00
  • 3b4fe65e1c Minor ik/kq_mask Iwan Kawrakow 2024-08-28 18:31:27 +03:00
  • 4d10f4e0ba WIP KQ binary mask Iwan Kawrakow 2024-08-28 18:14:03 +03:00
  • 97dbc16e86 WIP KQ binary mask Iwan Kawrakow 2024-08-28 16:42:49 +02:00
  • 316345c535 WIP KQ binary mask Iwan Kawrakow 2024-08-28 17:27:48 +03:00
  • a8b762ddd9 Minor Iwan Kawrakow 2024-08-28 15:07:09 +02:00
  • 05f95229a7 WIP KQ binary mask: make it a parameter, turn on via command line Iwan Kawrakow 2024-08-28 15:01:02 +02:00
  • fe825ecbe4 WIP KQ binary mask: Metal soft_max Iwan Kawrakow 2024-08-28 13:04:23 +02:00
  • 900a39bec9 WIP KQ binary mask: Metal Iwan Kawrakow 2024-08-28 11:40:26 +02:00
  • 62d6ef2892 WIP KQ binary mask: for now, just use fp16 when flash attention is on Iwan Kawrakow 2024-08-28 10:51:59 +03:00
  • 1216a43719 WIP KQ binary mask: CUDA Iwan Kawrakow 2024-08-28 10:03:10 +03:00
  • 511c459232 WIP: play with KQ mask - make it binary Iwan Kawrakow 2024-08-28 09:08:49 +03:00
  • 16b8d3d229 WIP Iwan Kawrakow 2024-08-27 19:45:57 +03:00
  • 0f301124b1 WIP: play with KQ mask - make it fp16 Iwan Kawrakow 2024-08-27 19:08:31 +03:00
  • 3f7899c250 Faster Gemma2 (#27) Kawrakow 2024-08-27 17:40:59 +03:00
  • c7e99c88a2 Faster Gemma2 (#27) Kawrakow 2024-08-27 17:40:59 +03:00
  • e4f200098b Flash attention with softcap: Metal ik/fused_softcap_softmax Iwan Kawrakow 2024-08-26 18:34:43 +02:00
  • 1ad3b25132 soft_cap_max: Metal Iwan Kawrakow 2024-08-26 17:55:20 +02:00
  • 46862d725b Add softcap to flash attention Iwan Kawrakow 2024-08-26 18:22:29 +03:00
  • 7168adfe71 soft_cap_max: looks good on CPU and CUDA Iwan Kawrakow 2024-08-26 15:27:11 +03:00
  • ec0ae14aee WIP: various, nithing is really better Iwan Kawrakow 2024-08-26 14:25:04 +03:00
  • 3bdf0d8df6 WIP softmax: ~3% gain on Zen4 Iwan Kawrakow 2024-08-26 09:45:48 +03:00
  • b9c60d4717 WIP Iwan Kawrakow 2024-08-26 07:54:39 +03:00
  • 585aa2bee3 WIP: plugging into ggml_compute_forward_flash_attn_ext_f16 Iwan Kawrakow 2024-08-25 19:08:28 +03:00
  • cfee1b68ec WIP: plugging into ggml_compute_forward_flash_attn_ext_f16 Iwan Kawrakow 2024-08-24 14:31:13 +03:00
  • 490d1a313a WIP: plugging into ggml_compute_forward_flash_attn_ext_f16 Iwan Kawrakow 2024-08-23 19:43:39 +03:00
  • 31ed9b331e WIP: plugging into ggml_compute_forward_flash_attn_ext_f16 Iwan Kawrakow 2024-08-23 16:48:35 +03:00
  • ffeb8b40eb WIP: plugging into ggml_compute_forward_flash_attn_ext_f16 Iwan Kawrakow 2024-08-23 15:47:08 +03:00
  • b127c6cced WIP: Fusing K*Q and softmax - not working yet Iwan Kawrakow 2024-08-23 09:56:28 +03:00
  • 2dbb3d70bf soft_cap_max: WIP - something is wrong with CUDA Iwan Kawrakow 2024-08-21 14:58:00 +03:00
  • 6e5d728040 soft_cap_max: initial CPU version of fused softcap + soft_max Iwan Kawrakow 2024-08-21 13:31:56 +03:00
  • 1268ef9430 softcap: minor improvement (#24) Kawrakow 2024-08-21 13:00:09 +03:00
  • bd99ed7d0a softcap: minor improvement (#24) Kawrakow 2024-08-21 13:00:09 +03:00
  • c9116e9eca softcap: minor improvement ik/softcap_minor Iwan Kawrakow 2024-08-21 12:34:35 +03:00
  • 8a10467990 Fused soft cap and SIMD-ified GeLU (#9) Kawrakow 2024-08-20 17:15:47 +03:00
  • d259a50ca6 Fused soft cap and SIMD-ified GeLU (#9) Kawrakow 2024-08-20 17:15:47 +03:00
  • 38dcba95fe iq4_k: use iq5_k also when n_gqa = 2 (#23) Kawrakow 2024-08-20 17:15:06 +03:00
  • a325745000 iq4_k: use iq5_k also when n_gqa = 2 (#23) Kawrakow 2024-08-20 17:15:06 +03:00
  • cc3d42e60b softcap, tanh: avoid NaNs for large arguments (NEON) ik/softcap Iwan Kawrakow 2024-08-20 14:46:30 +02:00
  • 4257ea1e92 llama-bench: add ability to turn off warmup runs Iwan Kawrakow 2024-08-20 14:44:01 +02:00
  • ad456dc25b softcap, tanh: avoid NaNs for large arguments (AVX2, AVX512) Iwan Kawrakow 2024-08-20 14:42:48 +03:00
  • 3e97ec87a2 iq4_k: use iq5_k also when n_gqa = 2 ik/iq4_k_tweaks Iwan Kawrakow 2024-08-20 09:29:45 +03:00
  • d50f4f9439 Simdified gelu Iwan Kawrakow 2024-08-02 16:19:11 +03:00
  • 0e2d76bb7c softcap: Metal and NEON Iwan Kawrakow 2024-08-02 07:41:52 +02:00
  • e49ce89901 softcap: CUDA Iwan Kawrakow 2024-08-02 06:34:29 +03:00
  • a8a0c692d4 softcap: CUDA Iwan Kawrakow 2024-08-01 21:09:40 +03:00
  • c4951cbc35 Softcap: WIP Iwan Kawrakow 2024-08-01 20:32:28 +03:00
  • 0a3b725e60 AVX2 quantization for Q8_K (#22) Kawrakow 2024-08-19 15:33:27 +03:00
  • a73702d93b AVX2 quantization for Q8_K (#22) Kawrakow 2024-08-19 15:33:27 +03:00
  • 6b6e2f2dbc AVX2 quantization for Q8_K ik/quantize_q8k_avx2 Iwan Kawrakow 2024-08-19 15:31:55 +03:00
  • fad55b735e quantize_stats: print rmse and max error as fraction of <x> (#21) Kawrakow 2024-08-19 13:49:28 +03:00
  • 5652100afc quantize_stats: print rmse and max error as fraction of <x> (#21) Kawrakow 2024-08-19 13:49:28 +03:00
  • ff471dfd61 quantize_stats: print rmse and max error as fraction of <x> ik/quantize_stats Iwan Kawrakow 2024-08-19 13:47:19 +03:00
  • 041d79925c iq2_k: slightly better bpw - accuracy compromise (#20) Kawrakow 2024-08-19 13:36:51 +03:00
  • c7b47fc67f iq2_k: slightly better bpw - accuracy compromise (#20) Kawrakow 2024-08-19 13:36:51 +03:00
  • b2212f170c iq2_k: slightly better bpw - accuracy compromise ik/iq2_k_tweak Iwan Kawrakow 2024-08-19 13:33:19 +03:00
  • a58853bf5e Skip barriers of noops (#19) Kawrakow 2024-08-14 10:40:09 +02:00
  • 6c5384f20e Skip barriers of noops (#19) Kawrakow 2024-08-14 10:40:09 +02:00
  • 686e75650e Skip barriers of noops ik/skip_noop_barriers Iwan Kawrakow 2024-08-14 09:49:12 +03:00
  • 25ade24526 Update README.md Kawrakow 2024-08-12 15:16:00 +02:00
  • bb5ff6fade Update README.md Kawrakow 2024-08-12 15:16:00 +02:00
  • 1a4cfbcc53 Merge mainline - Aug 12 2024 (#17) Kawrakow 2024-08-12 15:14:32 +02:00
  • 8f43e55103 Merge mainline - Aug 12 2024 (#17) Kawrakow 2024-08-12 15:14:32 +02:00
  • 28bb16556d Remove CI check ik/merge_Aug_12_2024 Iwan Kawrakow 2024-08-12 12:28:44 +03:00
  • 5f9350e2a1 Fix after merge Iwan Kawrakow 2024-08-12 12:13:56 +03:00
  • 3188517cf4 Merge mainline Iwan Kawrakow 2024-08-12 11:54:01 +03:00
  • 5ed6d94cb5 Fix Makefile Kawrakow 2024-08-09 17:29:32 +03:00
  • f5d1af61d7 Fix Makefile Iwan Kawrakow 2024-08-09 17:29:32 +03:00
  • 2c9aaae809 Fix Makefile ik/fix_Makefile Iwan Kawrakow 2024-08-09 17:29:32 +03:00
  • af2bb96de5 Fix Zen4 implementation of iq3_k, iq4_k, iq5_k Kawrakow 2024-08-09 10:32:07 +03:00
  • f0d7a0d53b Fix Zen4 implementation of iq3_k, iq4_k, iq5_k Iwan Kawrakow 2024-08-09 10:32:07 +03:00
  • 3f67708b91 iq6_k: AVX2 Kawrakow 2024-08-09 06:58:55 +03:00
  • c77dba5273 iq6_k: AVX2 Iwan Kawrakow 2024-08-09 06:58:55 +03:00
  • fa668c7dcb iq6_k: Metal Kawrakow 2024-08-08 16:27:43 +02:00
  • a829cb7794 iq6_k: Metal Iwan Kawrakow 2024-08-08 16:27:43 +02:00
  • ed462a512a iq6_k: NEON Kawrakow 2024-08-08 15:51:10 +02:00
  • 48c4389e3d iq6_k: NEON Iwan Kawrakow 2024-08-08 15:51:10 +02:00
  • ef32a01c2a iq6_k: slightly better Zen4 iqk_mul_mat Kawrakow 2024-08-08 14:01:08 +03:00
  • 595d2ae32d iq6_k: slightly better Zen4 iqk_mul_mat Iwan Kawrakow 2024-08-08 14:01:08 +03:00
  • 0bee1c0c0a iq6_k: Zen4 iqk_mul_mat Kawrakow 2024-08-08 11:17:42 +03:00
  • 849476acc7 iq6_k: Zen4 iqk_mul_mat Iwan Kawrakow 2024-08-08 11:17:42 +03:00
  • 1593acd09a iq6_k: CUDA dot product Kawrakow 2024-08-07 19:24:09 +03:00
  • 050bdfa101 iq6_k: CUDA dot product Iwan Kawrakow 2024-08-07 19:24:09 +03:00