Commit Graph

  • 48c0b7d8f9 Slightly faster iq4_kt Iwan Kawrakow 2025-05-23 16:07:21 +03:00
  • f015390efa MSVC cannot capture constexpr in lambdas Iwan Kawrakow 2025-05-23 15:52:45 +03:00
  • acc0b5cfbc Fix MSVC compilation Iwan Kawrakow 2025-05-23 15:04:19 +03:00
  • 31988c7eea Cleanup Iwan Kawrakow 2025-05-23 13:32:50 +03:00
  • 1500060f27 Somewhat faster iq3_kt (AVX2) Iwan Kawrakow 2025-05-23 13:19:47 +03:00
  • 193a15b465 Fix bug in MMVQ kernel ik/fix_mmvq_bug Iwan Kawrakow 2025-05-23 08:02:47 +03:00
  • 00d67b9059 Fix typo in non-AVX2 code branch (#445) Kawrakow 2025-05-23 12:02:54 +03:00
  • 7f2edd1a85 Fix typo in non-AVX2 code branch (#445) Kawrakow 2025-05-23 12:02:54 +03:00
  • 745973f294 Fix typo in non-AVX2 code branch ik/fix_typo Iwan Kawrakow 2025-05-23 12:00:23 +03:00
  • 25d34e3d2f Trellis quants with CPU inference (#441) Andrew Chan 2025-05-22 23:17:52 -07:00
  • a1c931c30c Trellis quants with CPU inference (#441) Andrew Chan 2025-05-22 23:17:52 -07:00
  • 3c4f887b10 gguf-split : update (#444) Nexes the Elder 2025-05-23 07:07:42 +02:00
  • 3efdd6df67 gguf-split : update (#444) Nexes the Elder 2025-05-23 07:07:42 +02:00
  • b79be8a191 Fix bug in MMVQ kernel ik/cuda_tracer Iwan Kawrakow 2025-05-23 08:02:47 +03:00
  • 39e712620d Streamline a bit the quant strategies (#443) Nexes the Elder 2025-05-22 17:04:47 +02:00
  • ec4563221e Streamline a bit the quant strategies (#443) Nexes the Elder 2025-05-22 17:04:47 +02:00
  • aa40ee491f Synchronize after each mul_mat in up_gate Iwan Kawrakow 2025-05-22 18:02:05 +03:00
  • 43d4302909 Checks in mul_mat_id Iwan Kawrakow 2025-05-22 17:30:20 +03:00
  • 26553e8488 Also add nb Iwan Kawrakow 2025-05-22 16:53:43 +03:00
  • 68d0dbf76a Print tensor info on error in fused_up_gate Iwan Kawrakow 2025-05-22 16:38:08 +03:00
  • 4446e8ca5c Also for matrix x vector Iwan Kawrakow 2025-05-22 14:05:35 +03:00
  • 11e674af0e Add cudaStreamSynchronize() at the end of fused up gate Iwan Kawrakow 2025-05-22 13:22:52 +03:00
  • 2ca7e29d7d Try enabling CUDA peer to peer copy Iwan Kawrakow 2025-05-22 11:04:36 +03:00
  • f925678ac8 Another peer to peer copy check Iwan Kawrakow 2025-05-22 10:33:18 +03:00
  • 1683e8e720 Refactor iqk_mul_mat.cpp (#435) Kawrakow 2025-05-22 10:05:51 +03:00
  • b94cd3b632 Refactor iqk_mul_mat.cpp (#435) Kawrakow 2025-05-22 10:05:51 +03:00
  • 21d9a8b471 Just in case, call ggml_cuda_set_device Iwan Kawrakow 2025-05-22 09:42:21 +03:00
  • d6c3d4b4a8 Add check if device to device access is enabled Iwan Kawrakow 2025-05-22 09:30:29 +03:00
  • e87bfe39ba Make sure we are on the corrent device before synchronizing Iwan Kawrakow 2025-05-22 08:37:15 +03:00
  • 8b339b1453 Trace synchronize calls from the back-end Iwan Kawrakow 2025-05-21 18:13:30 +03:00
  • aa6738a8f4 Just in case, set computed wkv_b vew source to null Iwan Kawrakow 2025-05-21 17:49:39 +03:00
  • 1375c21deb Slighty faster iq2_kt ik/andrew_trellis Iwan Kawrakow 2025-05-21 17:16:51 +03:00
  • 2b3a24d3c7 Move the asserts where they are neded Iwan Kawrakow 2025-05-21 16:38:11 +03:00
  • 23ae313bd3 Asserts in ggm-backend.c Iwan Kawrakow 2025-05-21 16:36:47 +03:00
  • 74247a0f3d Add some additional asserts Iwan Kawrakow 2025-05-21 16:12:28 +03:00
  • fa068f52a9 Add CUDA_CHECK to MMQ and MMVQ ops Iwan Kawrakow 2025-05-21 15:34:54 +03:00
  • 38692aaab4 Some performance tweaks Iwan Kawrakow 2025-05-21 13:39:45 +03:00
  • a8e3e33503 CUDA call tracer Iwan Kawrakow 2025-05-21 08:40:25 +03:00
  • 60a948bf44 delete unused and noncompiling mmvq functions Andrew Keen Chan 2025-05-20 16:55:45 +00:00
  • 82871cc2a3 Another attempt to fix the illegal memory access bug ik/desperate_bug_fix_attempt Iwan Kawrakow 2025-05-20 14:28:42 +03:00
  • dcc5ab31f1 Bug fixes from mainline (#439) Kawrakow 2025-05-20 17:03:14 +03:00
  • a2b5057a0c Bug fixes from mainline (#439) Kawrakow 2025-05-20 17:03:14 +03:00
  • 5252e95552 Clearing padding ik/cuda_mailine_fixes Iwan Kawrakow 2025-05-20 16:59:31 +03:00
  • 651fed0848 Add __syncthreads() to the new FA kernel Iwan Kawrakow 2025-05-20 15:34:27 +03:00
  • 0943331ec9 Fix q6_0 K cache ik/refactor_iqk Iwan Kawrakow 2025-05-20 10:46:36 +03:00
  • 46712586b3 Merge remote-tracking branch 'origin/main' into andrewkchan/try_trellis Andrew Keen Chan 2025-05-20 06:52:54 +00:00
  • 9ceef4941b more cleanup Andrew Keen Chan 2025-05-20 06:48:38 +00:00
  • aefab2eec1 Merge branch 'main' into andrewkchan/try_trellis Andrew Keen Chan 2025-05-20 06:48:14 +00:00
  • d5eb74d719 cleanup Andrew Keen Chan 2025-05-20 06:29:12 +00:00
  • 922b22f1e9 naming and remove unused fn Andrew Keen Chan 2025-05-20 06:12:59 +00:00
  • cb29146fbe fix (0.22t/s eval) Andrew Keen Chan 2025-05-20 06:08:30 +00:00
  • 5351ec0442 Fix q8_0 on NEON Iwan Kawrakow 2025-05-20 08:47:58 +03:00
  • 4fdb50b785 Fix iq4_ks on NEON Iwan Kawrakow 2025-05-20 07:59:45 +03:00
  • 7090f171e1 Fix iq4_k_r4 on NEON Iwan Kawrakow 2025-05-19 19:46:44 +03:00
  • 06efa17fa9 Adding forgotten iq5_k_r4 Iwan Kawrakow 2025-05-19 18:00:16 +03:00
  • 380ab3f33a Forgotten MMQ ref and typo (#431) Nexes the Elder 2025-05-18 16:36:41 +02:00
  • 65c8e860bf Refactor iqk: FA refactored (NEON) Iwan Kawrakow 2025-05-19 17:16:00 +03:00
  • 9ae8f75114 Fix bf16 Iwan Kawrakow 2025-05-19 15:30:46 +03:00
  • 9541631a52 Most helpers don't need to be templates Iwan Kawrakow 2025-05-19 15:20:43 +03:00
  • fbfe79e2fe Adding forgotten file Iwan Kawrakow 2025-05-19 13:42:20 +03:00
  • 630279cb54 Refactor iqk: FA refactored (Zen4) Iwan Kawrakow 2025-05-19 13:38:38 +03:00
  • 131e5ac6df Refactor iqk: FA compiles Iwan Kawrakow 2025-05-19 11:43:02 +03:00
  • 103345a872 wip buggy iq4_KT Andrew Keen Chan 2025-05-19 08:24:10 +00:00
  • 4b4b4fdcac Refactor iqk: GEMM kernels are refactored on NEON Iwan Kawrakow 2025-05-19 08:36:16 +03:00
  • 7aa2de6d5a Refactor iqk: factor out repacked iqk quants (NEON) Iwan Kawrakow 2025-05-19 08:25:56 +03:00
  • 7e59d2b974 Refactor iqk: factor out repacked k-quants (NEON) Iwan Kawrakow 2025-05-19 08:11:24 +03:00
  • 2b8a231d87 Refactor iqk: factor out repacked legacy quants (NEON) Iwan Kawrakow 2025-05-19 07:51:28 +03:00
  • 04eb150b9f iq3_kt (0.3t/s eval) and renames Andrew Keen Chan 2025-05-19 03:03:05 +00:00
  • c4e5d3e382 flatten 3inst iters + avx2 (0.3t/s eval) Andrew Keen Chan 2025-05-18 23:21:15 +00:00
  • addac77278 still super slow (0.17t/s eval) Andrew Keen Chan 2025-05-18 21:35:38 +00:00
  • 7561158313 WIP - working basic iq2_kt Andrew Keen Chan 2025-05-18 21:07:22 +00:00
  • bd1e4d4909 Refactor iqk: factor out legacy quants (NEON) Iwan Kawrakow 2025-05-18 19:47:53 +03:00
  • 465d717bb9 Refactor iqk: factor out iqk quants (NEON) Iwan Kawrakow 2025-05-18 19:06:46 +03:00
  • 312413694f Also iq4_xs belongs to k-quants Iwan Kawrakow 2025-05-18 18:14:45 +03:00
  • f4ab917e9e Refactor iqk: factor out floats (NEON) Iwan Kawrakow 2025-05-18 18:09:39 +03:00
  • c805a19202 Refactor iqk: factor out k-quants (NEON) Iwan Kawrakow 2025-05-18 17:41:54 +03:00
  • e19ecd296b Forgotten MMQ ref and typo (#431) Nexes the Elder 2025-05-18 16:36:41 +02:00
  • 2ec2229f2e Forgotten MMQ ref and typo (#431) Nexes the Elder 2025-05-18 16:36:41 +02:00
  • 28b94800c1 Refactor iqk: factor out 1-bit quants (NEON) Iwan Kawrakow 2025-05-18 16:54:44 +03:00
  • c63a0af5b7 Refactor iqk: GEMM kernels are refactored on AVX2/AVX512 Iwan Kawrakow 2025-05-18 15:50:20 +03:00
  • 0d96f3bd37 Refactor iqk: Factor out GEMM for repacked i-quants Iwan Kawrakow 2025-05-18 14:51:59 +03:00
  • f501200d42 Refactor iqk: Factor out GEMM for q8_K_R8, q8_KV Iwan Kawrakow 2025-05-18 14:02:07 +03:00
  • 6cd3609a85 Refactor iqk: Factor out GEMM for repacked legacy quants Iwan Kawrakow 2025-05-18 10:20:54 +03:00
  • 3cc0de96a6 WIP for IQ2_KT Andrew Keen Chan 2025-05-18 06:56:39 +00:00
  • 97ce7edb62 Disable multi-add for now ik/disable_multi_add Iwan Kawrakow 2025-05-18 08:36:12 +03:00
  • 7868545062 Refactor iqk: Factor out GEMM for iq1_bn, iq2_bn, iq2_bn_r4 Iwan Kawrakow 2025-05-17 19:53:48 +03:00
  • d66ec60836 Refactor iqk: fix AVX2 Iwan Kawrakow 2025-05-17 19:29:55 +03:00
  • 9b6e75cb79 Refactor iqk: Factor out GEMM for 1-bit quants (ABX2/AVX512) Iwan Kawrakow 2025-05-17 18:28:24 +03:00
  • 082a9bd632 Refactor iqk: fix AVX2 Iwan Kawrakow 2025-05-17 17:45:32 +03:00
  • de5660cee3 Refactor iqk: Factor out GEMM for iqk-quants (AVX2/AVX512) Iwan Kawrakow 2025-05-17 17:34:34 +03:00
  • 8dae13cd84 Refactor iqk: fix AVX2 Iwan Kawrakow 2025-05-17 16:43:53 +03:00
  • 2cbbc5581f Refactor iqk: Factor out GEMM for i-quants (AVX2/AVX512) Iwan Kawrakow 2025-05-17 16:34:25 +03:00
  • d355ff997b Refactor iqk: fix AVX2 Iwan Kawrakow 2025-05-17 15:45:15 +03:00
  • 4ef94c26fb Refactor iqk: Factor out GEMM for k-quants (AVX2/AVX512) Iwan Kawrakow 2025-05-17 15:34:56 +03:00
  • f83e64dcb6 Refactor iqk: Factor out GEMM for legacy quants (AVX2/AVX512) Iwan Kawrakow 2025-05-17 14:32:00 +03:00
  • 51a87cf20d Refactor iqk: Factor out float GEMM (AVX2/AVX512) Iwan Kawrakow 2025-05-17 13:41:39 +03:00
  • 68b782e861 Refactor iqk: WIP Iwan Kawrakow 2025-05-17 12:31:39 +03:00
  • 7e0ac477b8 Option to enable disable the IQK CPU FA kernels (#429) Kawrakow 2025-05-17 11:21:58 +03:00
  • b3036a872f Option to enable disable the IQK CPU FA kernels (#429) Kawrakow 2025-05-17 11:21:58 +03:00
  • 8c56fb3a72 Option to enable disable the IQK CPU FA kernels ik/option_cpu_fa Iwan Kawrakow 2025-05-17 11:03:10 +03:00