Commit Graph

  • 71bc74d738 Add missing enum values for qwen3 and qwen3moe (#356) Kawrakow 2025-04-29 10:05:38 +02:00
  • 9ba362706c Add missing enum values for qwen3 and qwen3moe (#356) Kawrakow 2025-04-29 10:05:38 +02:00
  • b036119637 Add missing enum values for qwen3 and qwen3moe ik/add_missing_enum_values_qwen3 Iwan Kawrakow 2025-04-29 11:04:38 +03:00
  • 8b62ee32ca Apply Qwen3 PR from llama.cpp (#355) Ben Harris 2025-04-29 16:02:08 +08:00
  • 1064f5bc31 Apply Qwen3 PR from llama.cpp (#355) Ben Harris 2025-04-29 16:02:08 +08:00
  • 2f2803a1d7 Update AUTHORS Kawrakow 2025-04-29 07:22:06 +02:00
  • 99b87a375f Update AUTHORS Kawrakow 2025-04-29 07:22:06 +02:00
  • 9d9f9f96b2 CPU FA improvements (#351) Kawrakow 2025-04-29 07:19:43 +02:00
  • cda24b58cb CPU FA improvements (#351) Kawrakow 2025-04-29 07:19:43 +02:00
  • 5d65eaaf29 Edits Iwan Kawrakow 2025-04-28 17:50:49 +03:00
  • f0b1d049a3 Update README.md Kawrakow 2025-04-28 16:27:44 +02:00
  • 1f77976476 Update README.md ikawrakow-patch-1 Kawrakow 2025-04-28 16:25:48 +02:00
  • 20d50172d0 Much better FA TG with q8_0 KV cache ik/fattn_work_buffer Iwan Kawrakow 2025-04-28 11:26:28 +03:00
  • 802d4de1b5 WIP Iwan Kawrakow 2025-04-24 18:38:41 +03:00
  • 9be8b490b1 WIP Iwan Kawrakow 2025-04-24 09:25:13 +03:00
  • cd44692bc0 Use Sum4q4 for q4_0 Iwan Kawrakow 2025-04-23 15:43:59 +03:00
  • b19fd13141 Use mul_mat_qX_0_q8_2_Tx for q4_0 in FA Iwan Kawrakow 2025-04-23 14:35:34 +03:00
  • ddcdf25e54 Use mul_mat_qX_0_q8_2_Tx for q6_0 in FA Iwan Kawrakow 2025-04-23 14:12:08 +03:00
  • 9f310ea663 Try to improve for unusual number of heads/number of threads Iwan Kawrakow 2025-04-22 12:37:11 +03:00
  • 39714026fe WIP Iwan Kawrakow 2025-04-21 19:03:26 +03:00
  • a7cd27f7e0 WIP (Zen4) Iwan Kawrakow 2025-04-21 11:27:33 +03:00
  • 26eb64c4f9 Slightly better Iwan Kawrakow 2025-04-21 09:40:58 +03:00
  • bcacf33350 WIP Iwan Kawrakow 2025-04-19 09:12:50 +03:00
  • 998c1b2117 WIP Iwan Kawrakow 2025-04-18 19:09:24 +03:00
  • fae18dd0bc WIP Iwan Kawrakow 2025-04-18 17:00:06 +03:00
  • b498633203 WIP Iwan Kawrakow 2025-04-18 13:50:30 +03:00
  • 74a21d48d6 Add header to avoid comp0iler warnings Iwan Kawrakow 2025-04-18 09:08:25 +03:00
  • 6801a4368c FA: provide work buffer for K repacking Iwan Kawrakow 2025-04-17 16:14:40 +03:00
  • 42d7e58a96 Add GLM-4-0414 Model Support (#344) ubergarm 2025-04-26 11:34:04 -04:00
  • baeefb4731 Add GLM-4-0414 Model Support (#344) ubergarm 2025-04-26 11:34:04 -04:00
  • 815307d3bd Fix division by zero bug (#349) Kawrakow 2025-04-26 09:19:43 +02:00
  • 9e846f0eb1 Fix division by zero bug (#349) Kawrakow 2025-04-26 09:19:43 +02:00
  • 957308ca09 Fix division by zero bug ik/fix_div_zero Iwan Kawrakow 2025-04-26 10:08:37 +03:00
  • 86be28d5bd Add support for Cohere2 (#341) Kawrakow 2025-04-26 08:13:25 +02:00
  • 715fc552ad Add support for Cohere2 (#341) Kawrakow 2025-04-26 08:13:25 +02:00
  • 4413f17b58 Fix q4_1 and q5_1 on Arm (#348) Kawrakow 2025-04-25 19:48:08 +02:00
  • 770892086c Fix q4_1 and q5_1 on Arm (#348) Kawrakow 2025-04-25 19:48:08 +02:00
  • 78458aa83d Fix q4_1 and q5_1 on Arm ik/fix_q41_q51_arm Iwan Kawrakow 2025-04-25 19:42:21 +02:00
  • 95675f6194 Command-A needs fp32 precision for K*Q ik/cohere2 Iwan Kawrakow 2025-04-25 15:51:03 +03:00
  • eb47293337 Fixe IQ4_NL on AVX2 Iwan Kawrakow 2025-04-25 15:37:06 +03:00
  • e6a7c26dc5 Add support for Cohere2 Iwan Kawrakow 2025-04-23 09:08:43 +03:00
  • fb98619852 Add ability to manually set arch flags (#347) Kawrakow 2025-04-25 13:24:18 +02:00
  • c817160d03 Add ability to manually set arch flags (#347) Kawrakow 2025-04-25 13:24:18 +02:00
  • d641a5fef3 Add ability to manually set arch flags ik/arch_flags Iwan Kawrakow 2025-04-25 11:41:49 +02:00
  • 542351d088 Fix FA on ARM (#346) Kawrakow 2025-04-25 11:01:08 +02:00
  • 25d1a0dca8 Fix FA on ARM (#346) Kawrakow 2025-04-25 11:01:08 +02:00
  • 160bf27714 Fix FA on ARM ik/fix_arm_fa Iwan Kawrakow 2025-04-25 10:58:05 +02:00
  • c26f5b315d Fix LLaMA-4 attention (#342) Kawrakow 2025-04-25 09:21:03 +02:00
  • f176122a3d Fix LLaMA-4 attention (#342) Kawrakow 2025-04-25 09:21:03 +02:00
  • 2d2a03df24 cuda: use switch in constexpr funcs (#343) Kawrakow 2025-04-24 17:37:12 +02:00
  • c9eec1729f cuda: use switch in constexpr funcs (#343) Kawrakow 2025-04-24 17:37:12 +02:00
  • f71763c2d2 cuda: use switch in constexpr funcs ik/pickup_13095 Iwan Kawrakow 2025-04-24 18:34:00 +03:00
  • 6250937c49 Fix LLaMA-4 attention ik/fix_llama4_attention Iwan Kawrakow 2025-04-24 13:59:19 +03:00
  • bf095b682f Update gguf-py constants (#298) saood06 2025-04-24 00:34:10 -05:00
  • 222a195743 Update gguf-py constants (#298) saood06 2025-04-24 00:34:10 -05:00
  • adb6b6fb3f Update GGML_QUANT_SIZES s6/fix_python Saood Karim 2025-04-23 23:06:26 -05:00
  • 614e59733e BitNet adjustments (#338) Kawrakow 2025-04-22 08:46:31 +02:00
  • 9dac3edf2f BitNet adjustments (#338) Kawrakow 2025-04-22 08:46:31 +02:00
  • e79f523bcc BitNet adjustments ik/bitnet_adjustments Iwan Kawrakow 2025-04-22 09:36:32 +03:00
  • e6c85a5b95 Add support for bitnet2b_2501 model (#337) saood06 2025-04-22 01:34:13 -05:00
  • cc39800723 Add support for bitnet2b_2501 model (#337) saood06 2025-04-22 01:34:13 -05:00
  • 3d7206e6ea Support both model names s6/bitnet2b_2501 Saood Karim 2025-04-21 21:47:17 -05:00
  • 2641658c90 Fixes Saood Karim 2025-04-21 04:31:58 -05:00
  • 356918048b add support for bitnet2b_2501 model potassiummmm 2025-03-12 18:16:03 +08:00
  • 16f945d9bb Fix termux/android build (#336) saood06 2025-04-21 02:13:46 -05:00
  • 93cd77b655 Fix termux/android build (#336) saood06 2025-04-21 02:13:46 -05:00
  • d75c151624 Attempt fix 13 s6/termux_fix Saood Karim 2025-04-21 01:56:05 -05:00
  • 8e82ee73cb Attempt fix 12 Saood Karim 2025-04-20 22:24:06 -05:00
  • cf6cfdc290 Attempt fix 11 Saood Karim 2025-04-20 22:18:03 -05:00
  • ca1aae058e Attempt fix 10 Saood Karim 2025-04-20 22:14:10 -05:00
  • 44ef87f07a Attempt fix 9 Saood Karim 2025-04-20 22:08:03 -05:00
  • 9aa1b15742 Attempt fix 8 Saood Karim 2025-04-20 02:15:02 -05:00
  • 16d2c206e9 Attempt fix 7 Saood Karim 2025-04-20 02:01:24 -05:00
  • 309e7ada7b Attempt fix 6 Saood Karim 2025-04-20 01:45:44 -05:00
  • f0dc73dabf Attempt fix 5 Saood Karim 2025-04-20 01:30:58 -05:00
  • b79a92cb29 Attempt fix 4 Saood Karim 2025-04-20 01:09:52 -05:00
  • 5c2380a55b Attempt fix 3 Saood Karim 2025-04-20 00:38:03 -05:00
  • 1882327e9b Attempt fix 2 Saood Karim 2025-04-20 00:34:15 -05:00
  • bb285c428b Attempt fix Saood Karim 2025-04-20 00:27:27 -05:00
  • 4a70adae94 Better TG performance for GQA models (CPU) (#332) Kawrakow 2025-04-17 08:08:40 +02:00
  • 3bb64d9330 Better TG performance for GQA models (CPU) (#332) Kawrakow 2025-04-17 08:08:40 +02:00
  • 3e41c56a8a Minor ik/tg_tweaks Iwan Kawrakow 2025-04-16 17:15:05 +03:00
  • 5fe73695b0 Better CPU FA implementation for TG when GQA Iwan Kawrakow 2025-04-16 15:02:59 +03:00
  • cec920940b Slightly better CPU TG performance for GQA Iwan Kawrakow 2025-04-16 11:29:46 +03:00
  • 5a98a66b5c Better gemm/gemv on AVX2 fr q4_0_r8 (#331) Kawrakow 2025-04-15 17:18:50 +02:00
  • f7c5a94e75 Better gemm/gemv on AVX2 fr q4_0_r8 (#331) Kawrakow 2025-04-15 17:18:50 +02:00
  • 3164fa3310 Better gemm/gemv on AVX2 fr q4_0_r8 ik/faster_avx2_q40 Iwan Kawrakow 2025-04-15 18:12:22 +03:00
  • 1a786850e6 Allow q8_0 KV cache for head size 256 (#330) Kawrakow 2025-04-15 17:05:31 +02:00
  • 1bbb143eb3 Allow q8_0 KV cache for head size 256 (#330) Kawrakow 2025-04-15 17:05:31 +02:00
  • a164a50a36 We need also these ik/gemma_q80_kvcache Iwan Kawrakow 2025-04-15 13:56:29 +03:00
  • e86c0333a5 Allow q8_0 KV cache for head size 256 Iwan Kawrakow 2025-04-15 12:43:41 +03:00
  • 70a1d99fb8 imatrix: collect layer influence statistics (#328) Kawrakow 2025-04-14 19:43:19 +02:00
  • 05dbbeaf14 imatrix: collect layer influence statistics (#328) Kawrakow 2025-04-14 19:43:19 +02:00
  • 9be8812727 Add ability to hide imatrix details in llama-quantize (#329) Kawrakow 2025-04-14 19:41:31 +02:00
  • 028e0cfa19 Add ability to hide imatrix details in llama-quantize (#329) Kawrakow 2025-04-14 19:41:31 +02:00
  • 8bff04c9d6 Use stripped tensor name, not src0->name ik/imatrix_lsim Iwan Kawrakow 2025-04-14 19:00:06 +03:00
  • 4ed6076940 Add ability to hide imatrix details in llama-quantize ik/hide_imatrix Iwan Kawrakow 2025-04-14 16:36:57 +03:00
  • a891d49d59 imatrix: separate metric for attention and ffn importance Iwan Kawrakow 2025-04-14 16:26:31 +03:00
  • 02629c9ab9 imatrix: collect layer influence statiscs also for the last layer Iwan Kawrakow 2025-04-14 11:48:28 +03:00
  • 34be9d8d57 imatrix: collect layer influence statistics Iwan Kawrakow 2025-04-14 10:03:39 +03:00