Commit Graph

  • afa2323c4c q4_0_r8 (NEON) Iwan Kawrakow 2025-01-26 17:46:57 +02:00
  • e9c74af22b q4_0_r8 (AVX2) Iwan Kawrakow 2025-01-26 16:04:28 +02:00
  • 56ca4c3ba9 FA: repack Q8_0 to Q8_0_R8 (NEON) ik/iq4_xs_r8_v2 Iwan Kawrakow 2025-01-26 12:24:38 +02:00
  • 3484ee6ddb Remove special purpose mul_mat_q8_0_r4_q8_1_128 (Zen4) Iwan Kawrakow 2025-01-26 11:34:57 +02:00
  • cc438189d5 FA: repack Q8_0 to Q8_0_R8 Iwan Kawrakow 2025-01-26 10:50:48 +02:00
  • 4de6088eef 8-rows interleaved q8_0 (NEON) Iwan Kawrakow 2025-01-26 09:43:22 +02:00
  • 45075579ef 8-rows interleaved q8_0 (Zen4) - slightly better Iwan Kawrakow 2025-01-26 07:45:08 +02:00
  • 1774ef6b07 8-rows interleaved q8_0 (Zen4) Iwan Kawrakow 2025-01-26 07:11:42 +02:00
  • 1053ac50fe 8-rows interleaved q8_0 (AVX2) Iwan Kawrakow 2025-01-26 06:24:35 +02:00
  • 3bfe569348 Cleanup Iwan Kawrakow 2025-01-25 17:22:35 +02:00
  • 9354ea22f6 Try interleaving 8 iq4_xs rows Iwan Kawrakow 2025-01-25 15:17:23 +02:00
  • 1ac69af2fe Try interleaving 8 rows for iq4_xs Iwan Kawrakow 2025-01-25 11:01:44 +02:00
  • 385a4f596c Merge remote-tracking branch 'origin/main' into ik/try_trellis Iwan Kawrakow 2025-01-24 15:49:23 +02:00
  • 4e84067f88 Update chat templates (#177) Kawrakow 2025-01-24 06:30:10 +02:00
  • 814d3e054c Update chat templates (#177) Kawrakow 2025-01-24 06:30:10 +02:00
  • bb23d014ab Removing missed conflict marker ik/chat_templates Iwan Kawrakow 2025-01-23 19:31:49 +02:00
  • b8eff06bb0 Adopting chat template stuff from llama.cpp Iwan Kawrakow 2025-01-23 18:24:51 +02:00
  • 5c0a01bdaf Deepseek V3 support added (#176) saood06 2025-01-23 10:24:10 -06:00
  • 2195632581 Deepseek V3 support added (#176) saood06 2025-01-23 10:24:10 -06:00
  • 6d1b4adaac Add Deepseek-R1-Distill pre-tokenizer Kawrakow 2025-01-23 13:10:03 +02:00
  • c2624b2fd3 Add Deepseek-R1-Distill pre-tokenizer Iwan Kawrakow 2025-01-23 13:10:03 +02:00
  • d868ca149a Disable mul_mat_Qx_Qy_Mx1 on AVX2 ik/gemv_bf16_r16 Iwan Kawrakow 2025-01-23 11:58:42 +02:00
  • 8a5a81b4dc Slightly better gemv for not repacked fp16 Iwan Kawrakow 2025-01-23 09:04:07 +02:00
  • 4941c043bb Improve gemv for bf16_r16 Iwan Kawrakow 2025-01-23 08:29:48 +02:00
  • ccd8523bba Better BF16 support on AVX2 (#175) Kawrakow 2025-01-22 12:13:55 +02:00
  • dbf5d31d01 Better BF16 support on AVX2 (#175) Kawrakow 2025-01-22 12:13:55 +02:00
  • cc7642c757 Slightly faster fp16/bf16 gemv on AVX2 ik/avx2_bf16 Iwan Kawrakow 2025-01-22 09:03:57 +02:00
  • 2c2f728afc Adding BF16 support for AVX2 Iwan Kawrakow 2025-01-22 08:37:48 +02:00
  • 09d4a8ad90 On Zen4 repack fp16 models to bf16_r16 when run-time-repacking is requested (#174) Kawrakow 2025-01-21 19:19:38 +02:00
  • 6d23495b9b On Zen4 repack fp16 models to bf16_r16 when run-time-repacking is requested (#174) Kawrakow 2025-01-21 19:19:38 +02:00
  • ef2b0066b9 On Zen4 repack fp16 models to bf16_r16 when run-time-repacking is requested ik/zen4_repack_f16 Iwan Kawrakow 2025-01-21 19:14:57 +02:00
  • 1e44bdf6e5 More Flash Attention improvements (#173) Kawrakow 2025-01-20 08:57:38 +02:00
  • 3c5f87225f More Flash Attention improvements (#173) Kawrakow 2025-01-20 08:57:38 +02:00
  • 31d7424afb FA: turn off performance timer ik/fattn_kqv Iwan Kawrakow 2025-01-19 18:37:46 +02:00
  • 4ecfaaea48 FA: dedicated mat mul for D = 128 also for ARM_NEON Iwan Kawrakow 2025-01-19 16:29:03 +01:00
  • e9951656f8 FA: vectorize q8_0 -> q8_0_r4 repacking also on NEON Iwan Kawrakow 2025-01-19 15:58:11 +01:00
  • 2480624b68 FA: fix ARN_NEON Iwan Kawrakow 2025-01-19 14:43:51 +01:00
  • 38b8b062bd FA: Fix AVX2 Iwan Kawrakow 2025-01-19 15:14:06 +02:00
  • cf0351f803 FA: faster q8_0 cache via run-time-repacking Iwan Kawrakow 2025-01-19 14:06:57 +02:00
  • 96ce347243 FA: timing Iwan Kawrakow 2025-01-18 13:47:08 +02:00
  • 7efe16f715 FA: don't store sum scaling factor in SIMD registers Iwan Kawrakow 2025-01-18 09:44:32 +02:00
  • 0e8cfb3d78 FA: slightly faster V*softmax(K*Q)) on Zen4 Iwan Kawrakow 2025-01-18 08:35:01 +02:00
  • fe69411196 FA: slightly faster V*softmax(K*Q)) also for fp16 K-cache Iwan Kawrakow 2025-01-17 14:39:46 +01:00
  • 884b2ed85a Deleted forgotten commented out code Iwan Kawrakow 2025-01-16 18:00:55 +01:00
  • 04ada68b84 FA: it is also faster on AVX2 and ARM_NEON Iwan Kawrakow 2025-01-16 17:57:37 +01:00
  • 4753c861d1 FA: slightly faster V*softmax(K*Q)) on Zen4 Iwan Kawrakow 2025-01-16 17:10:42 +02:00
  • c606c19101 CPU Flash Attention improvements (#172) Kawrakow 2025-01-15 18:19:22 +02:00
  • 0b74397d59 CPU Flash Attention improvements (#172) Kawrakow 2025-01-15 18:19:22 +02:00
  • 3e7d5c180c On Zen4 it is also better to not use large Q steps for fp16 K-cache ik/fattn_bf16 Iwan Kawrakow 2025-01-15 18:09:07 +02:00
  • 3f4425205a FA: don't use large Q steps on AVX2 for fp16 K-cache Iwan Kawrakow 2025-01-15 17:48:14 +02:00
  • 37162d2695 Fix AVX2 Iwan Kawrakow 2025-01-15 14:32:23 +02:00
  • 0ecc20e481 Fix q8_0 KV cache when not using FA - NEON Iwan Kawrakow 2025-01-15 11:36:01 +01:00
  • ad78678bb9 Fix q8_0 KV cache when not using FA - WIP (AVX2) Iwan Kawrakow 2025-01-15 12:13:08 +02:00
  • 093cf3ec9b FA: slightly better quantized kv-cache speed for large contexts Iwan Kawrakow 2025-01-15 09:03:46 +02:00
  • 379ca23e1d FA: much better bf16 kv-cache speed for large contexts Iwan Kawrakow 2025-01-14 18:14:47 +02:00
  • 2b58f31b36 FA: allow bf16 for V-cache with any supported K-cache Iwan Kawrakow 2025-01-14 11:40:05 +02:00
  • 7623f769c0 Slightly faster FA for Q8_0 KV cache Iwan Kawrakow 2025-01-14 10:21:11 +02:00
  • 2afe2e1d41 Slightly faster FA for bf16 KV cache Iwan Kawrakow 2025-01-13 17:47:47 +02:00
  • c6503556b7 Fix the strange FA behavior with odd/even batch sizes (#171) Kawrakow 2025-01-12 16:51:06 +02:00
  • 49b27069fd Fix the strange FA behavior with odd/even batch sizes (#171) Kawrakow 2025-01-12 16:51:06 +02:00
  • 983e86805e Fix the strange FA behavior with odd/even batch sizes ik/fix_fattn_odd_even Iwan Kawrakow 2025-01-12 16:12:05 +02:00
  • 7d107ee10e MoE fix for R4 quants (#170) Kawrakow 2025-01-12 13:19:14 +02:00
  • c19404bcda MoE fix for R4 quants (#170) Kawrakow 2025-01-12 13:19:14 +02:00
  • e2f8747555 Make sure rows per thread is a multiple of 4 also for MoE when using _r4 quants ik/fix_mul_mat_16 Iwan Kawrakow 2025-01-12 11:39:52 +02:00
  • 4e7ce22614 Fix bug in iqk_mul_mat Iwan Kawrakow 2025-01-12 11:00:04 +02:00
  • 400b774294 Be able to re-quantize MS BitNet I2_S models (#169) Kawrakow 2025-01-10 18:18:04 +02:00
  • 7553989dd8 Be able to re-quantize MS BitNet I2_S models (#169) Kawrakow 2025-01-10 18:18:04 +02:00
  • 882f52032a Be able to re-quantize MS BitNet I2_S models ik/convert_i2s Iwan Kawrakow 2025-01-10 17:43:44 +02:00
  • c411615505 Falcon3 changes (#168) Kawrakow 2025-01-10 15:06:00 +02:00
  • b1363b6177 Falcon3 changes (#168) Kawrakow 2025-01-10 15:06:00 +02:00
  • 8174b7f538 q8_k16: use integer arithmetic to sum row values ik/falcon3a Iwan Kawrakow 2025-01-10 14:01:01 +02:00
  • 712f348f85 Add Falcon3 pre-tokinizer (same as llama3) Iwan Kawrakow 2025-01-10 13:36:58 +02:00
  • 9f8cd4049a q8_k16: use integer arithmetic to sum row values ik/falcon3 Iwan Kawrakow 2025-01-10 14:01:01 +02:00
  • 94d151886c Add Falcon3 pre-tokinizer (same as llama3) Iwan Kawrakow 2025-01-10 13:36:58 +02:00
  • 8bc80e08d0 Adapt to iq4_nl_x4 -> iq4_nl_r4 change ik/cuda_q4_0_r4 Iwan Kawrakow 2024-12-08 10:50:53 +02:00
  • d9589d82cb cuda q4_0_r4: dot product works Iwan Kawrakow 2024-12-06 18:17:40 +02:00
  • 85a0730447 cuda q4_0_r4: dequantize works Iwan Kawrakow 2024-12-06 15:25:45 +02:00
  • e9cc863487 iq4_0_r4: Use AVX2 version for matrix x vector (#163) Kawrakow 2024-12-23 17:34:08 +01:00
  • 3e6851621c iq4_0_r4: Use AVX2 version for matrix x vector (#163) Kawrakow 2024-12-23 17:34:08 +01:00
  • 1f785d2a3a iq4_0_r4: Use AVX2 version for matrix x vector ik/mv_q4_0_r4 Iwan Kawrakow 2024-12-23 18:30:47 +02:00
  • da3bfd1009 IQ3_S_R4 (#162) Kawrakow 2024-12-23 14:34:23 +01:00
  • 167479e027 IQ3_S_R4 (#162) Kawrakow 2024-12-23 14:34:23 +01:00
  • c794281b8e iq3_s_r4: rearranged quants - NEON ik/iq3_s_r4_v2 Iwan Kawrakow 2024-12-23 13:51:32 +01:00
  • 3fb04caef4 iq3_s_r4: rearranged quants - AVX2 Iwan Kawrakow 2024-12-23 13:28:12 +02:00
  • a2df24a0d9 iq3_s_r4: rearrange quants Iwan Kawrakow 2024-12-23 12:09:23 +02:00
  • aa2595415a MSVC fixes (#161) Kawrakow 2024-12-23 07:57:48 +01:00
  • 1a0a35dcd1 MSVC fixes (#161) Kawrakow 2024-12-23 07:57:48 +01:00
  • b31c3e9103 iq3_s_r4: NEON ik/iq3_s_r4 Iwan Kawrakow 2024-12-22 20:25:47 +01:00
  • ada68c3b37 iq3_s_r4: AVX2 Iwan Kawrakow 2024-12-22 19:52:38 +02:00
  • 4e36826d51 One more ik/fix_windows Iwan Kawrakow 2024-12-22 19:14:51 +02:00
  • 716ad317a2 iq3_s_r4: slightly better Zen4 Iwan Kawrakow 2024-12-22 19:11:07 +02:00
  • f7e22b02e0 iq3_s_r4: Zen4 Iwan Kawrakow 2024-12-22 18:26:10 +02:00
  • 8ef8e50e84 iq3_s_r4: WIP Iwan Kawrakow 2024-12-22 16:51:30 +02:00
  • baf06a0805 MSVC fixes Iwan Kawrakow 2024-12-22 16:49:43 +02:00
  • 2ed8f432a4 Faster R4 legacy quants (#158) Kawrakow 2024-12-22 12:00:22 +01:00
  • 3d732cb010 Faster R4 legacy quants (#158) Kawrakow 2024-12-22 12:00:22 +01:00
  • 4de66b9248 qx_0_r4(AVX2): convert scales with SIMD instrinsics ik/qx_0_r4_avx2 Iwan Kawrakow 2024-12-22 12:26:17 +02:00
  • 7220cc74c0 q4_0_r4(avx2): convert q8_1 scales with SIMD instrinsics Iwan Kawrakow 2024-12-22 11:08:25 +02:00
  • fbf975741e R4 i-quants improvements (#157) Kawrakow 2024-12-22 10:52:56 +01:00
  • 907cde6be2 R4 i-quants improvements (#157) Kawrakow 2024-12-22 10:52:56 +01:00