Commit Graph

  • 0f7aa11b6d Improved IQ1_M quantization (#327) Kawrakow 2025-04-13 10:37:55 +02:00
  • d210661c91 Improved IQ1_M quantization (#327) Kawrakow 2025-04-13 10:37:55 +02:00
  • 4291d7e1e6 Minor ik/improve_iq1m Iwan Kawrakow 2025-04-13 08:34:43 +03:00
  • 29773692f1 Cleanup Iwan Kawrakow 2025-04-13 08:11:36 +03:00
  • 514637ed47 Much faster and it looks like better iq1_m quantiation Iwan Kawrakow 2025-04-12 18:55:48 +03:00
  • 65fc77c285 Fix KLD precision (#325) Kawrakow 2025-04-12 16:17:50 +02:00
  • c01449a478 Fix KLD precision (#325) Kawrakow 2025-04-12 16:17:50 +02:00
  • 9b24ae7fc6 Fix KLD precision ik/fix_kld Iwan Kawrakow 2025-04-12 10:01:20 +03:00
  • a3b16affaf Correct L4 rms_norm (#324) Kawrakow 2025-04-11 10:49:18 +02:00
  • 3e7be3d28e Correct L4 rms_norm (#324) Kawrakow 2025-04-11 10:49:18 +02:00
  • c5f1a0ad25 Correct L4 rms_norm ik/l4_rms_norm Iwan Kawrakow 2025-04-11 11:45:33 +03:00
  • 5c127b279f LlaMA-4 support (text only) (#321) Kawrakow 2025-04-10 09:05:21 +02:00
  • 474435f58b LlaMA-4 support (text only) (#321) Kawrakow 2025-04-10 09:05:21 +02:00
  • b51661bbff llama4: this seems to be working ik/llama4 Iwan Kawrakow 2025-04-09 12:02:22 +03:00
  • 7a1fc34023 llama4: WIP Iwan Kawrakow 2025-04-09 09:56:45 +03:00
  • 80846bb2c9 WIP ik/improve_iq2ks Iwan Kawrakow 2025-04-08 17:16:16 +03:00
  • 346295577a WIP - LLaMA-2 is slightly better Iwan Kawrakow 2025-04-08 16:16:57 +03:00
  • c50d00d0dc Guard against attempts to use MLA for non-MLA models (#320) Kawrakow 2025-04-08 08:47:24 +02:00
  • 5f44f4b3d0 Guard against attempts to use MLA for non-MLA models (#320) Kawrakow 2025-04-08 08:47:24 +02:00
  • 5ec2cb63ae Guard against attempts to use MLA for non-MLA models ik/mla_guard Iwan Kawrakow 2025-04-08 09:45:08 +03:00
  • d223b26daf Update AUTHORS Kawrakow 2025-04-07 17:31:35 +02:00
  • 22d7440ba2 Update AUTHORS Kawrakow 2025-04-07 17:31:35 +02:00
  • 21725684a4 Update AUTHORS Kawrakow 2025-04-07 17:28:19 +02:00
  • f03ae19aad Update AUTHORS Kawrakow 2025-04-07 17:28:19 +02:00
  • 16b36613d5 Use links for ggml/llama.cpp authors (#318) Kawrakow 2025-04-07 17:25:06 +02:00
  • b38759127a Use links for ggml/llama.cpp authors (#318) Kawrakow 2025-04-07 17:25:06 +02:00
  • ae7cf9a766 More ik/update_license Iwan Kawrakow 2025-04-07 17:58:09 +03:00
  • 2ca1ab007c This file is not html Iwan Kawrakow 2025-04-07 16:22:15 +03:00
  • 9cc2374796 Use links for ggml/llama.cpp authors Iwan Kawrakow 2025-04-07 16:12:11 +03:00
  • 86c9b08846 Better iq2_xs quantization (#312) Kawrakow 2025-04-07 12:39:04 +02:00
  • 2309ecda80 Better iq2_xs quantization (#312) Kawrakow 2025-04-07 12:39:04 +02:00
  • 8210ed4883 Add copyright notices (#317) Kawrakow 2025-04-07 10:43:26 +02:00
  • a051f08b8f Add copyright notices (#317) Kawrakow 2025-04-07 10:43:26 +02:00
  • 9bd4357cbc Update LICENSE Kawrakow 2025-04-07 10:41:40 +02:00
  • abbabf7ca1 Update LICENSE Kawrakow 2025-04-07 10:41:40 +02:00
  • 8b9be1a048 Add copyright notices ik/copyright Iwan Kawrakow 2025-04-07 11:19:54 +03:00
  • 0dbcd57267 Try not repacking q8_0 for FA computations ik/try_fa_no_q80_repack Iwan Kawrakow 2025-04-06 09:49:59 +03:00
  • d3c0cc788b We need to synchronize before using device to host async memcpy (#313) Kawrakow 2025-04-05 14:31:27 +02:00
  • ec84855c6a We need to synchronize before using device to host async memcpy (#313) Kawrakow 2025-04-05 14:31:27 +02:00
  • c2bab6cee5 We need to synchronize before using device to host async memcpy ik/fix_cuda_memcpy_async Iwan Kawrakow 2025-04-05 15:28:20 +03:00
  • fe157dee95 Better iq2_xs quantization ik/improve_iq2_xs Iwan Kawrakow 2025-04-05 11:51:26 +03:00
  • c7fceae221 Add -flax-vector-conversions for GCC on ARM (#311) Kawrakow 2025-04-04 11:04:19 +02:00
  • c616306a01 Add -flax-vector-conversions for GCC on ARM (#311) Kawrakow 2025-04-04 11:04:19 +02:00
  • 8b711a5880 Add -flax-vector-conversions for GCC on ARM ik/flax-vector-conversions Iwan Kawrakow 2025-04-04 11:03:02 +02:00
  • 9ab6dc9f91 Metal: FA and FlashMLA (#310) Kawrakow 2025-04-03 17:54:25 +02:00
  • 073eda985e Metal: FA and FlashMLA (#310) Kawrakow 2025-04-03 17:54:25 +02:00
  • 2596f7b856 Metal FA: MLA options now all work ik/metal_fattn_update Iwan Kawrakow 2025-04-03 17:03:52 +02:00
  • 53c2e1489c WIP Iwan Kawrakow 2025-04-03 16:29:40 +02:00
  • 1f260865ef Fix GCC compilation errors on ARM (#309) Kawrakow 2025-04-03 15:50:53 +02:00
  • 2ee6263e24 Fix GCC compilation errors on ARM (#309) Kawrakow 2025-04-03 15:50:53 +02:00
  • 310bce3c1d One more ik/fix_gcc_arm Iwan Kawrakow 2025-04-03 13:10:53 +02:00
  • 9c7af40ff6 Fix GCC compilation errors on ARM Iwan Kawrakow 2025-04-03 11:02:14 +02:00
  • 2cb31c6592 Metal FA: go to float Iwan Kawrakow 2025-04-03 09:54:33 +02:00
  • e757dea3bf Metal: WIP to update Metal FA implementation Iwan Kawrakow 2025-04-03 09:12:00 +02:00
  • 3b5da96073 Metal: much faster MoE prompt processing (#307) Kawrakow 2025-04-03 07:15:49 +02:00
  • 07dbc1aa06 Metal: much faster MoE prompt processing (#307) Kawrakow 2025-04-03 07:15:49 +02:00
  • f258e60e60 Much better ik/metal_moe Iwan Kawrakow 2025-04-02 19:57:40 +02:00
  • d9d372249e Some cleanup Iwan Kawrakow 2025-04-02 18:39:50 +02:00
  • 2a5552830b MoE improvements on Metal Iwan Kawrakow 2025-04-02 15:26:19 +02:00
  • 79db2e243f docs: update README.md (#304) Ikko Eltociear Ashimine 2025-04-02 04:30:25 +09:00
  • 6d405d1fd1 docs: update README.md (#304) Ikko Eltociear Ashimine 2025-04-02 04:30:25 +09:00
  • df20261b6a Fix ARM_NEON build failure due to q8_2 (#303) Kawrakow 2025-04-01 13:48:20 +02:00
  • 21a5b8bd28 Fix ARM_NEON build failure due to q8_2 (#303) Kawrakow 2025-04-01 13:48:20 +02:00
  • c8dee5b35d Fix ARM_NEON build failure due to q8_2 ik/fix_neon_q82 Iwan Kawrakow 2025-04-01 13:45:33 +02:00
  • 1bc60d6cc9 Quantization improvements (2) (#302) Kawrakow 2025-04-01 10:31:06 +02:00
  • 190e7866db Quantization improvements (2) (#302) Kawrakow 2025-04-01 10:31:06 +02:00
  • a630958fb4 Additional guards for interleaved quants (#299) Kawrakow 2025-04-01 08:29:47 +02:00
  • b07a337bfe Additional guards for interleaved quants (#299) Kawrakow 2025-04-01 08:29:47 +02:00
  • ba3030c9c3 Fix #300 (#301) Kawrakow 2025-04-01 08:29:25 +02:00
  • 6e5156cab5 Fix #300 (#301) Kawrakow 2025-04-01 08:29:25 +02:00
  • 86513a6e06 Small improvement for type-1 quants ik/iqk_q_improvements Iwan Kawrakow 2025-03-31 16:33:18 +03:00
  • 55688c49aa Fix #300 ik/fix_300 Iwan Kawrakow 2025-03-31 15:22:08 +03:00
  • 7d55051faa Simplify ik/interleaved_guards Iwan Kawrakow 2025-03-31 13:46:20 +03:00
  • 48fcd53bcd Make sure no interleaved quants are being used for token embeddings Iwan Kawrakow 2025-03-31 12:44:40 +03:00
  • a7f026eebb Update LlamaFileType Saood Karim 2025-03-31 02:12:45 -05:00
  • e98daabcf1 Update GGMLQuantizationType Saood Karim 2025-03-31 01:15:45 -05:00
  • 56860314c6 iq3_k: slightly better quantization Iwan Kawrakow 2025-03-29 09:12:45 +02:00
  • 3c3825d7f6 Quantization improvements (#295) Kawrakow 2025-03-29 08:09:52 +01:00
  • 4819257ce6 Quantization improvements (#295) Kawrakow 2025-03-29 08:09:52 +01:00
  • b9c25fe753 Sae for iq4_nl, iq4_xs ik/make_qx_quants Iwan Kawrakow 2025-03-28 07:20:56 +02:00
  • c8d47fab04 Better make_qx_quants Iwan Kawrakow 2025-03-27 19:35:43 +02:00
  • 9898f480fe Make sure tensor row size is multiple of block size also when quantizing with --pure (#294) Kawrakow 2025-03-27 10:48:52 +01:00
  • 23b0addb34 Make sure tensor row size is multiple of block size also when quantizing with --pure (#294) Kawrakow 2025-03-27 10:48:52 +01:00
  • 7e706abdc1 Merge remote-tracking branch 'origin/main' into ik/change_q_pure ik/change_q_pure Iwan Kawrakow 2025-03-27 11:47:29 +02:00
  • 75687d2af3 Add check if selected type is possible with --pure Iwan Kawrakow 2025-03-27 09:17:34 +02:00
  • d71e84bdc1 Use bf16 instead of fp16 block scales for q8_1 (#292) Kawrakow 2025-03-27 05:49:16 +01:00
  • d0b52076da Use bf16 instead of fp16 block scales for q8_1 (#292) Kawrakow 2025-03-27 05:49:16 +01:00
  • 918abd1683 q8_0_r8 on avx2 ik/use_q8_2 Iwan Kawrakow 2025-03-26 19:19:43 +02:00
  • b428bf14b8 Also q4_1 and q5_1 Iwan Kawrakow 2025-03-26 18:52:12 +02:00
  • f1b4762ed7 q6_0_r4 Iwan Kawrakow 2025-03-26 18:31:22 +02:00
  • a4d7fb77fd q5_0_r4 Iwan Kawrakow 2025-03-26 18:08:44 +02:00
  • 0170c8f93f q4_0_r8 Iwan Kawrakow 2025-03-26 17:57:28 +02:00
  • 9ce890ecac Use bf16 instead of f16,int16 Iwan Kawrakow 2025-03-26 17:36:49 +02:00
  • 8e2d549c68 It works for q8_0 Iwan Kawrakow 2025-03-26 13:52:53 +02:00
  • 40ab112869 q8_0 without bells and wistles works Iwan Kawrakow 2025-03-26 12:49:29 +02:00
  • 970c16458a WIP - not working Iwan Kawrakow 2025-03-26 12:32:32 +02:00
  • 2089147ae0 Disable Zen4 optimizations for Q8_0/Q8_0_R8 ik/test_q80_NaNs Iwan Kawrakow 2025-03-26 08:25:34 +02:00
  • f31aca2d40 Whitespace s6/numa_KV Saood Karim 2025-03-25 14:30:11 -05:00
  • cc8c0e1b49 More cleanup Saood Karim 2025-03-25 14:29:00 -05:00
  • 109f5c0cd8 Cleanup Saood Karim 2025-03-25 14:23:11 -05:00