Commit Graph

  • ddc7ec4ac7 Fix case with empty sql Saood Karim 2025-06-30 14:51:55 -05:00
  • 3a634c7a5c Make version endpoint always accessible Saood Karim 2025-06-27 20:34:33 -05:00
  • 66c8dc416d Update version number and add features array to version Saood Karim 2025-06-27 20:15:40 -05:00
  • 8fd4774f94 Remove hardcoded extension and add error handling to extension loading Saood Karim 2025-06-27 18:36:10 -05:00
  • 036b5bb822 Minor ik/improve_mmq Iwan Kawrakow 2025-06-27 19:00:58 +03:00
  • 1e6e4833bc Slightly better q8_0_q8_1 kerneel and iqk_ks tile loading Iwan Kawrakow 2025-06-27 18:26:32 +03:00
  • 305fe4b160 Remove what appears to be unnecessary asserts in ggml_cuda_cpy (#560) Kawrakow 2025-06-27 17:44:36 +02:00
  • bce7697d64 Remove what appears to be unnecessary asserts in ggml_cuda_cpy (#560) Kawrakow 2025-06-27 17:44:36 +02:00
  • 17d2db910f Use cuBLAS for large batches and quants with block size 16 (#559) Kawrakow 2025-06-27 17:43:51 +02:00
  • 31bd3185f2 Use cuBLAS for large batches and quants with block size 16 (#559) Kawrakow 2025-06-27 17:43:51 +02:00
  • 409bfe6648 Remove what appears to be unnecessary asserts in ggml_cuda_cpy ik/cuda_large_cpy Iwan Kawrakow 2025-06-26 20:27:50 +03:00
  • 3dbc84377f Use cuBLAS for large batches and quants with block size 16 ik/mmq_to_cublas Iwan Kawrakow 2025-06-26 14:13:00 +03:00
  • 4c7579e617 mikupad.html in ik_llama.cpp (functional but WIP) Saood Karim 2025-06-26 02:14:20 -05:00
  • 2fb1db705e CUDA: MMQ for iqX_r4 quants (#557) Kawrakow 2025-06-26 08:50:49 +02:00
  • 5236c98b41 CUDA: MMQ for iqX_r4 quants (#557) Kawrakow 2025-06-26 08:50:49 +02:00
  • c8e6d9cfe7 Add Falcon-Edge support (#555) Kawrakow 2025-06-26 08:48:52 +02:00
  • 8e5106b20f Add Falcon-Edge support (#555) Kawrakow 2025-06-26 08:48:52 +02:00
  • b3417c9366 iqk_r4 quants: use MMQ only for batches < 1024 tokens ik/cuda_iqk_r4 Iwan Kawrakow 2025-06-25 14:47:59 +03:00
  • 9b273bf437 cuda: MMQ for iq5_k_r4 Iwan Kawrakow 2025-06-25 13:35:47 +03:00
  • 9e2c94083f cuda: MMQ for iq4_k_r4 Iwan Kawrakow 2025-06-25 13:12:37 +03:00
  • e87a319f5f cuda: MMQ for iq3_k_r4 Iwan Kawrakow 2025-06-25 12:42:37 +03:00
  • 4c4db19c46 cuda: MMQ for iq2_k_r4 Iwan Kawrakow 2025-06-25 12:10:31 +03:00
  • b74bd33b6a Add Falcon-Edge support ik/falcon_edge Iwan Kawrakow 2025-06-25 10:11:37 +03:00
  • b8402290ef Much faster prompt processing for IQ1_S and IQ1_M on ARM_NEON (#553) Kawrakow 2025-06-24 14:21:37 +02:00
  • b5f2f00106 Much faster prompt processing for IQ1_S and IQ1_M on ARM_NEON (#553) Kawrakow 2025-06-24 14:21:37 +02:00
  • e5e5acfdda iq1_m ik/gemm_neon_1bit Iwan Kawrakow 2025-06-24 14:05:30 +02:00
  • 3c5b788b70 iq1_s Iwan Kawrakow 2025-06-24 13:27:25 +02:00
  • 58d2cbf948 Much faster prompt processing for k-quants (ARM_NEON) (#552) Kawrakow 2025-06-24 13:05:01 +02:00
  • 64f6c2dead Much faster prompt processing for k-quants (ARM_NEON) (#552) Kawrakow 2025-06-24 13:05:01 +02:00
  • e18b10bc48 Merge remote-tracking branch 'origin/main' into ik/gemm_neon_kquants ik/gemm_neon_kquants Iwan Kawrakow 2025-06-24 13:03:02 +02:00
  • c3c60c3b40 iq4_xs Iwan Kawrakow 2025-06-24 11:25:07 +02:00
  • 915a4a31cc q5_k Iwan Kawrakow 2025-06-24 11:06:45 +02:00
  • d1b4b34a79 q4_k Iwan Kawrakow 2025-06-24 10:16:04 +02:00
  • 78d531c04a q6_k Iwan Kawrakow 2025-06-24 08:59:28 +02:00
  • 52ad57b042 q3_K Iwan Kawrakow 2025-06-23 18:38:31 +02:00
  • 6818e14184 q2_k Iwan Kawrakow 2025-06-23 17:50:26 +02:00
  • 8c7e1b72d7 Much faster prompt processing for I-quants (ARM_NEON) (#550) Kawrakow 2025-06-23 15:50:24 +02:00
  • ddda4d9e64 Much faster prompt processing for I-quants (ARM_NEON) (#550) Kawrakow 2025-06-23 15:50:24 +02:00
  • 548a5f3f0d iq3_s ik/gemm_neon_iquants Iwan Kawrakow 2025-06-23 15:34:41 +02:00
  • 26965677e8 iq3_xxs Iwan Kawrakow 2025-06-23 15:09:46 +02:00
  • c52f58915b iq2_s Iwan Kawrakow 2025-06-23 14:42:49 +02:00
  • 8b3318676b iq2_xs Iwan Kawrakow 2025-06-23 14:14:45 +02:00
  • edb5f9ca8b iq2_xxs Iwan Kawrakow 2025-06-23 13:50:38 +02:00
  • 032723a333 Much faster prompt processing for IQK quants (ARM_NEON) (#549) Kawrakow 2025-06-23 11:55:50 +02:00
  • 4776dd2809 Much faster prompt processing for IQK quants (ARM_NEON) (#549) Kawrakow 2025-06-23 11:55:50 +02:00
  • ec4b4536ea iq2_k ik/gemm_neon_iqk Iwan Kawrakow 2025-06-23 11:15:25 +02:00
  • c15688ca3a iq3_k Iwan Kawrakow 2025-06-23 10:55:22 +02:00
  • a15cfa0401 iq4_k Iwan Kawrakow 2025-06-23 09:59:52 +02:00
  • 54dfc404e7 iq5_k Iwan Kawrakow 2025-06-23 09:31:11 +02:00
  • 425bbe7609 iq6_k Iwan Kawrakow 2025-06-22 19:31:45 +02:00
  • eeea8af04f iq5_ks Iwan Kawrakow 2025-06-22 18:49:42 +02:00
  • 26ebe8dd44 Faster GEMM fir iq2_ks, iq4_ks Iwan Kawrakow 2025-06-22 18:02:09 +02:00
  • fed6448640 To use GGML_ABORT we need to include ggml-impl.h. Kawrakow 2025-06-22 17:49:32 +03:00
  • cac763fc20 To use GGML_ABORT we need to include ggml-impl.h. Iwan Kawrakow 2025-06-22 17:49:32 +03:00
  • 258c111d84 Abort if IQK_IMPLEMENT is not defined Kawrakow 2025-06-22 16:49:38 +03:00
  • 22d6817d1e Abort if IQK_IMPLEMENT is not defined Iwan Kawrakow 2025-06-22 16:49:38 +03:00
  • d3ca8c23cf Faster ARM_NEON GEMM implementation for legacy quants (#546) Kawrakow 2025-06-21 16:35:08 +02:00
  • 4f97409b80 Faster ARM_NEON GEMM implementation for legacy quants (#546) Kawrakow 2025-06-21 16:35:08 +02:00
  • 0386bfb19b Perhaps slightly faster trellis quants (#541) Kawrakow 2025-06-21 16:32:16 +02:00
  • a98b7678a3 Perhaps slightly faster trellis quants (#541) Kawrakow 2025-06-21 16:32:16 +02:00
  • aaa164773d q5_1 ik/gemm_neon_legacy Iwan Kawrakow 2025-06-21 16:00:22 +02:00
  • ce4fb5863e q4_1 Iwan Kawrakow 2025-06-21 15:50:21 +02:00
  • 8b102792d8 iq4_nl Iwan Kawrakow 2025-06-21 15:13:45 +02:00
  • a78bed0f8e q8_0 Iwan Kawrakow 2025-06-21 12:43:30 +02:00
  • a834e4be15 q6_0 Iwan Kawrakow 2025-06-21 12:33:52 +02:00
  • f8efac6295 q5_0 Iwan Kawrakow 2025-06-21 12:21:02 +02:00
  • 1f31789b96 q4_0 Iwan Kawrakow 2025-06-21 11:58:38 +02:00
  • a0ba58e9b9 iq2_kt and iq3_kt work with new int trellis ik/metal_new_trellis Iwan Kawrakow 2025-06-20 10:47:22 +03:00
  • 5b677c3caf Enable next_128() also on AVX2 ik/trellis_opt Iwan Kawrakow 2025-06-20 08:34:45 +03:00
  • 1287a66627 With fancy simd also set func16 Iwan Kawrakow 2025-06-20 08:06:25 +03:00
  • 1dfc023fef Cleanup Iwan Kawrakow 2025-06-19 17:27:54 +03:00
  • 6c0b796435 WIP Iwan Kawrakow 2025-06-19 16:57:29 +03:00
  • 8fcede9c34 This looks better for iq4_kt TG Iwan Kawrakow 2025-06-19 15:40:38 +03:00
  • 14578c3dce This seems slightly faster for IQ2_KT, IQ3_KT TG Iwan Kawrakow 2025-06-19 14:51:15 +03:00
  • 293203b3dd New integer trellis on ARM_NEON (#544) Kawrakow 2025-06-20 09:26:36 +03:00
  • 1843ed22c5 New integer trellis on ARM_NEON (#544) Kawrakow 2025-06-20 09:26:36 +03:00
  • a45e368444 iq3_kt is now working on NEON ik/neon_iq3_kt Iwan Kawrakow 2025-06-20 07:25:48 +03:00
  • 031eadab1d Adapt iq3_kt to new trellis on NEON Iwan Kawrakow 2025-06-19 19:38:30 +03:00
  • 1d6e143ecb Fix NEON build (#542) Kawrakow 2025-06-19 18:37:22 +03:00
  • 144ee1c4c6 Fix NEON build (#542) Kawrakow 2025-06-19 18:37:22 +03:00
  • 1e534df8ea Fix NEON build ik/fix_neon_build Iwan Kawrakow 2025-06-19 18:35:16 +03:00
  • d1f92e24d3 add dry sampler (#513) firecoperana 2025-06-19 02:24:53 -05:00
  • 3f111ad7bb add dry sampler (#513) firecoperana 2025-06-19 02:24:53 -05:00
  • 638fb80e8a Minor readme update (#535) saood06 2025-06-19 02:18:39 -05:00
  • c5368148cf Minor readme update (#535) saood06 2025-06-19 02:18:39 -05:00
  • d595dfaa2f Update CMakeLists.txt to fix NDEBUG handling (#537) Anton Sokolchenko 2025-06-19 09:18:21 +02:00
  • 39e17589a2 Update CMakeLists.txt to fix NDEBUG handling (#537) Anton Sokolchenko 2025-06-19 09:18:21 +02:00
  • 08d2100c07 Fix missed block_q8_x2 bf16 -> i16 change (#540) Kawrakow 2025-06-19 09:35:36 +03:00
  • c6166b4020 Fix missed block_q8_x2 bf16 -> i16 change (#540) Kawrakow 2025-06-19 09:35:36 +03:00
  • 19ac0b595c Fix missed block_q8_x2 bf16 -> i16 change ik/fix_538 Iwan Kawrakow 2025-06-19 09:02:42 +03:00
  • 829a4d7177 move thing fix s6/readme-minor1 Saood Karim 2025-06-18 12:36:41 -05:00
  • ed17aa1015 move thing Saood Karim 2025-06-18 12:35:55 -05:00
  • 4fa4aeec5d Fix KT Neon / ARM typo (#536) Louie Helm 2025-06-18 16:55:02 +00:00
  • 0ade534305 Fix KT Neon / ARM typo (#536) Louie Helm 2025-06-18 16:55:02 +00:00
  • ea8a9019cd move thing Saood Karim 2025-06-18 11:47:39 -05:00
  • e31481b834 Condense CUDA implementations). Saood Karim 2025-06-18 11:42:35 -05:00
  • 87105fc3b4 Fix MSVC compilation error Kawrakow 2025-06-18 16:48:36 +03:00
  • 7479c2a3e5 Fix MSVC compilation error Iwan Kawrakow 2025-06-18 16:48:36 +03:00
  • d345a15a84 New IQ2_KT, IQ3_KT and IQ4_KT, V2 (#529) Kawrakow 2025-06-18 16:20:54 +03:00
  • d85c64428e New IQ2_KT, IQ3_KT and IQ4_KT, V2 (#529) Kawrakow 2025-06-18 16:20:54 +03:00