Commit Graph

  • 10929e6395 Update AUTHORS Kawrakow 2025-07-23 18:14:51 +02:00
  • 9ee72225dc Function calling support for Kimi-K2 (#628) Anton Sokolchenko 2025-07-23 18:11:42 +02:00
  • 3701fb1686 Function calling support for Kimi-K2 (#628) Anton Sokolchenko 2025-07-23 18:11:42 +02:00
  • 2c0d1b8bff iq4_kss: repack/convert to q8_k_r8 (NEON) ik/iq4_kss_improvements Iwan Kawrakow 2025-07-23 17:33:35 +02:00
  • 179e2d36f6 iq4_kss: repack/convert to q8_k_r8 (AVX2) Iwan Kawrakow 2025-07-23 18:19:30 +03:00
  • 8b72e9390d iq4_kss: CUDA MMQ Iwan Kawrakow 2025-07-23 17:22:28 +03:00
  • ba6f691a0c iq4_kss: slightly better quantization Iwan Kawrakow 2025-07-23 17:00:43 +03:00
  • eaa2510a28 Add GitHub data: filename sanitization (#640) Thomas 2025-07-23 13:31:53 +02:00
  • 0451f10a42 Add GitHub data: filename sanitization (#640) Thomas 2025-07-23 13:31:53 +02:00
  • 3600d82e98 Fix pauses after a comma (#639) Kawrakow 2025-07-23 11:45:58 +02:00
  • 7093a35869 Fix pauses after a comma (#639) Kawrakow 2025-07-23 11:45:58 +02:00
  • b685f9b4aa Fix pauses after a comma ik/fix_comma_pauses Iwan Kawrakow 2025-07-23 12:25:04 +03:00
  • 94aa54df76 Add GitHub data (#637) Thomas 2025-07-22 18:18:40 +02:00
  • ab7d193fe0 Add GitHub data (#637) Thomas 2025-07-22 18:18:40 +02:00
  • 9513222ba5 Revert "Update README.md" t0002 Kawrakow 2025-07-22 15:22:46 +03:00
  • 866c70e60d Revert "Update README.md" Iwan Kawrakow 2025-07-22 15:22:46 +03:00
  • 4ea000892d Add .mailmap Kawrakow 2025-07-22 14:51:00 +03:00
  • fa9441809a Add .mailmap Iwan Kawrakow 2025-07-22 14:51:00 +03:00
  • c3cd543d77 Update README.md Kawrakow 2025-07-22 09:01:59 +02:00
  • b48d71fec8 Update README.md ikawrakow 2025-07-22 09:01:59 +02:00
  • 18eeb48941 Webui: New Features for Conversations, Settings, and Chat Messages (#618) firecoperana 2025-07-20 05:33:55 -05:00
  • d44c2d3f5a Webui: New Features for Conversations, Settings, and Chat Messages (#618) firecoperana 2025-07-20 05:33:55 -05:00
  • e1164e1fd8 Adding IQ1_KT - 1.75 bpw SOTA quants (#616) Kawrakow 2025-07-20 10:05:23 +02:00
  • f989fb03bd Adding IQ1_KT - 1.75 bpw SOTA quants (#616) Kawrakow 2025-07-20 10:05:23 +02:00
  • d0bc1f8296 IQ1_M GEMM for ARM_NEON (#631) Kawrakow 2025-07-20 09:49:59 +02:00
  • 07673c6c33 IQ1_M GEMM for ARM_NEON (#631) Kawrakow 2025-07-20 09:49:59 +02:00
  • 6bc5812c21 Set repacking threshold ik/iq1_m_neon Iwan Kawrakow 2025-07-20 09:47:32 +02:00
  • 957b9e8339 iq1_m GEMM on NEON Iwan Kawrakow 2025-07-20 09:34:43 +02:00
  • 3da192ac33 Remove forgotten change Kawrakow 2025-07-18 20:11:57 +03:00
  • 38012f7290 Remove forgotten change Iwan Kawrakow 2025-07-18 20:11:57 +03:00
  • 712eb7b45c GEMM for iq1_m (#630) Kawrakow 2025-07-18 18:55:43 +02:00
  • cc82006f51 GEMM for iq1_m (#630) Kawrakow 2025-07-18 18:55:43 +02:00
  • 8800f62ccf GEMM for iq1_m ik/iq1m_gemm Iwan Kawrakow 2025-07-18 19:16:00 +03:00
  • 77eaa532c7 iq1_kt: add to constants.py ik/iq1_kt Iwan Kawrakow 2025-07-17 18:56:07 +03:00
  • 1f273689bb q5_K tweaks ik/quantization_tweaks Iwan Kawrakow 2025-07-17 18:03:24 +03:00
  • 597cdea43f q4_K tweaks Iwan Kawrakow 2025-07-17 17:50:17 +03:00
  • 060ed8ba13 q3_K tweaks Iwan Kawrakow 2025-07-17 17:28:41 +03:00
  • 8c944f29c5 q2_K tweaks Iwan Kawrakow 2025-07-17 16:52:39 +03:00
  • 912b74c151 Minor iq3_k tweak Iwan Kawrakow 2025-07-17 16:00:28 +03:00
  • ed5ed600bc iq3_ks quantization tweaks Iwan Kawrakow 2025-07-17 12:53:46 +03:00
  • 39ef8eeb9d Two things Iwan Kawrakow 2025-07-17 11:25:36 +03:00
  • cc51044e72 Add GGML_MAX_CONTEXTS definition in CMakeLists.txt (#622) Thireus ☠ 2025-07-17 07:50:42 +01:00
  • b94f3af56f Add GGML_MAX_CONTEXTS definition in CMakeLists.txt (#622) Thireus ☠ 2025-07-17 07:50:42 +01:00
  • eddeaac009 Bump Windows max open files from 512 to 2048 (#620) Thireus ☠ 2025-07-17 07:50:26 +01:00
  • 6950c82c30 Bump Windows max open files from 512 to 2048 (#620) Thireus ☠ 2025-07-17 07:50:26 +01:00
  • a65acba08a Adding frgotten file Iwan Kawrakow 2025-07-16 22:23:39 +03:00
  • 1755a76933 iq1_kt: very slightly faster convert/repack to q8_0_r8 on NEON Iwan Kawrakow 2025-07-16 15:20:18 +02:00
  • 410e9d25cd iq1_kt: convert/repack to q8_0_r8 (NEON) Iwan Kawrakow 2025-07-16 15:11:53 +02:00
  • 31554f534f iq1_kt: tiny bit better GEMV on NEON Iwan Kawrakow 2025-07-16 12:47:49 +02:00
  • 882fc0235e iq1_kt: slightly faster NEON - still pathetic Iwan Kawrakow 2025-07-16 12:33:11 +02:00
  • 3b6597c7a1 iq1_kt: NEON GEMM/GEMV Iwan Kawrakow 2025-07-16 11:51:15 +02:00
  • 6d1ddf1c26 iq1_kt: slightly faster GEMV Iwan Kawrakow 2025-07-16 11:22:09 +03:00
  • 572ed1e71c iq1_kt: convert/repack to q8_0_r8 (AVX2) Iwan Kawrakow 2025-07-16 08:28:05 +03:00
  • 224bfaf138 iq1_kt: AVX2 GEMM/GEMV Iwan Kawrakow 2025-07-16 07:57:25 +03:00
  • f90b75f078 iq1_kt: CUDA MMVQ Iwan Kawrakow 2025-07-15 19:59:33 +03:00
  • 0b5b9e3b68 iq1_kt: CUDA MMQ Iwan Kawrakow 2025-07-15 19:33:33 +03:00
  • 3da565c9c9 iq1_kt: CUDA dequantize Iwan Kawrakow 2025-07-15 19:06:50 +03:00
  • 4665e5b2f3 iq1_kt: basics Iwan Kawrakow 2025-07-15 17:51:49 +03:00
  • 5e357db589 Fixup kimi-k2 convert indentation (#617) ubergarm 2025-07-16 09:24:20 -04:00
  • c4fbced37d Fixup kimi-k2 convert indentation (#617) ubergarm 2025-07-16 09:24:20 -04:00
  • da38486de5 Bump GGML_MAX_CONTEXTS to allow loading more shards (#611) Thireus ☠ 2025-07-16 13:11:19 +01:00
  • 4803142300 Bump GGML_MAX_CONTEXTS to allow loading more shards (#611) Thireus ☠ 2025-07-16 13:11:19 +01:00
  • d3ed217798 kimi-k2 convert script and chat template (#612) ubergarm 2025-07-15 13:54:04 -04:00
  • 13b2f19372 kimi-k2 convert script and chat template (#612) ubergarm 2025-07-15 13:54:04 -04:00
  • 19c57dbe1d Vulkan: a fresh start (#608) Kawrakow 2025-07-15 08:03:13 +02:00
  • 2081b3fccb Vulkan: a fresh start (#608) Kawrakow 2025-07-15 08:03:13 +02:00
  • f375799f17 Adding IQ2_KL (#602) Kawrakow 2025-07-14 18:55:08 +02:00
  • 45fae1a144 Adding IQ2_KL (#602) Kawrakow 2025-07-14 18:55:08 +02:00
  • da8998c6c6 Ported kimi-k2 support from llama.cpp (#609) Aleksey Nikiforov 2025-07-14 12:43:52 -04:00
  • f5353047ef Ported kimi-k2 support from llama.cpp (#609) Aleksey Nikiforov 2025-07-14 12:43:52 -04:00
  • c462c5bdf6 q8_k_r8: AVX512 version ik/q8_k_r8_avx512 Iwan Kawrakow 2025-07-14 18:45:55 +03:00
  • 14ef9ebe9a Vulkan: fix u_batch > 4096/n_active_experts ik/vulkan_again Iwan Kawrakow 2025-07-14 17:28:55 +03:00
  • c7f3515a58 Vulkan needs f32 precision for flash attention Iwan Kawrakow 2025-07-14 14:42:58 +03:00
  • ae12c8b616 Seems to be working with coopmat Iwan Kawrakow 2025-07-14 13:33:18 +03:00
  • 495139a3e3 It compiles Iwan Kawrakow 2025-07-14 11:43:37 +03:00
  • f6d33e821e Add iq2_kl to constants.py ik/iq2_kl Iwan Kawrakow 2025-07-13 20:17:38 +03:00
  • c6d6467952 iq2_kl: slightly better Metal dequantize Iwan Kawrakow 2025-07-12 08:27:27 +02:00
  • 7da419b9c9 iq2_kl: slightly better Metal dequantize Iwan Kawrakow 2025-07-12 08:11:28 +02:00
  • e27708b341 iq2_kl: Metal GEMV - slightly better (46.5 t/s -> 47.2 t/s) Iwan Kawrakow 2025-07-12 07:46:09 +02:00
  • 1693cc6e60 iq2_kl: Metal GEMV - slightly better (44.5 t/s -> 46.5 t/s) Iwan Kawrakow 2025-07-12 07:34:35 +02:00
  • 1921d41675 iq2_kl: Metal GEMV - slightly better (40 t/s -> 44.5 t/s) Iwan Kawrakow 2025-07-12 07:23:46 +02:00
  • fe84026d43 iq2_kl: Metal GEMV - pretty slow Iwan Kawrakow 2025-07-11 19:41:09 +02:00
  • 3ff615fb76 iq2_kl: Metal dequantize Iwan Kawrakow 2025-07-11 18:41:12 +02:00
  • 278945ff84 iq2_kl: convert/repack to q8_k_r8 (NEON) Iwan Kawrakow 2025-07-11 17:43:10 +02:00
  • dd1c2a14d7 iq2_kl: NEON Iwan Kawrakow 2025-07-11 16:32:59 +02:00
  • b1956cd122 iq2_kl: WIP NEON Iwan Kawrakow 2025-07-11 15:50:47 +02:00
  • 4a3b5e3119 iq2_kl: AVX2 GEMM/GEMV Iwan Kawrakow 2025-07-11 15:40:53 +03:00
  • 738031ba0e iq2_kl: convert/repack to q8_k_r8 (AVX2) Iwan Kawrakow 2025-07-11 14:31:44 +03:00
  • b805f69c5a iq2_kl: better Zen4 Iwan Kawrakow 2025-07-11 13:09:58 +03:00
  • cd32c732f5 iq2_kl: Zen4 GEMM/GEMV Iwan Kawrakow 2025-07-11 12:02:44 +03:00
  • 23e9033f7b iq2_kl: MMVQ Iwan Kawrakow 2025-07-10 16:53:04 +03:00
  • 72f57c7f34 iq2_kl: MMQ Iwan Kawrakow 2025-07-10 16:10:07 +03:00
  • 29acfb6337 iq2_kl: small improvement in PPL Iwan Kawrakow 2025-07-10 14:53:54 +03:00
  • d0f85cb9c2 iq2_kl: CUDA dequantize Iwan Kawrakow 2025-07-10 14:22:37 +03:00
  • dfae22680b iq2_kl: basics Iwan Kawrakow 2025-07-10 12:40:07 +03:00
  • b7c986e4ff Experiments for 2.6875 bpw quants Iwan Kawrakow 2025-07-10 10:28:30 +03:00
  • 4f56069442 Add iq3_ks to constants.py (#606) Kawrakow 2025-07-13 19:14:26 +02:00
  • 255c22046b Add iq3_ks to constants.py (#606) Kawrakow 2025-07-13 19:14:26 +02:00
  • 9d7b8394e3 Add iq3_ks to constants.py ik/add_iq3ks_to_gguf Iwan Kawrakow 2025-07-13 20:13:15 +03:00
  • 276c045496 Add compression to server.cpp Saood Karim 2025-07-13 04:31:00 -05:00