Commit Graph

  • e2cf466eaa Fix attn_v conditionality (#604) Nexes the Elder 2025-07-13 11:28:18 +02:00
  • e2b1a5e1fc Fix attn_v conditionality (#604) Nexes the Elder 2025-07-13 11:28:18 +02:00
  • a6842ba601 Check if MMQ should be used before using it (#603) Kawrakow 2025-07-13 07:43:15 +02:00
  • b5ddec9516 Check if MMQ should be used before using it (#603) Kawrakow 2025-07-13 07:43:15 +02:00
  • 1bab0b3f46 Check if MMQ should be used before using it ik/fix_596 Iwan Kawrakow 2025-07-12 12:20:00 +03:00
  • 02d675717e Support for dots.llm1 models (#573) saood06 2025-07-10 02:37:36 -05:00
  • c53cb65251 Support for dots.llm1 models (#573) saood06 2025-07-10 02:37:36 -05:00
  • 4e2afbcd90 CUDA: Faster prompt processing for several quantization types (#595) Kawrakow 2025-07-10 09:27:28 +02:00
  • 283753cabc CUDA: Faster prompt processing for several quantization types (#595) Kawrakow 2025-07-10 09:27:28 +02:00
  • 660c33fa64 Remove commented lines s6/dots Saood Karim 2025-07-09 21:13:23 -05:00
  • a80e426c6f Minor s6/readme-minor2 saood06 2025-07-09 14:32:54 -05:00
  • 5795134fe7 Add back IQ1_S_R4 and IQ1_M_R4 saood06 2025-07-09 14:29:51 -05:00
  • ffd835b252 Minor fix Saood Karim 2025-07-09 13:38:56 -05:00
  • 46af019f81 Merge branch 'main' into s6/dots Saood Karim 2025-07-09 13:28:26 -05:00
  • 692dc0d9b5 Remove V reshaping, remove BOS by default for dots1 and fix warmup to handle models without BOS Saood Karim 2025-07-09 12:26:20 -05:00
  • db49223e8c add hunyuan moe support for 561 (#565) ubergarm 2025-07-09 04:29:40 -04:00
  • 5446ccc8ac add hunyuan moe support for 561 (#565) ubergarm 2025-07-09 04:29:40 -04:00
  • 62f5ab2cf2 cuda: slightly faster MMQ for iq4_xs ik/apply_cuda_faster_iq3k Iwan Kawrakow 2025-07-09 11:09:14 +03:00
  • 6836839907 cuda: slightly faster MMQ for iq4_ks Iwan Kawrakow 2025-07-09 10:46:21 +03:00
  • b5751170d3 cuda: slightly faster MMQ for iq4_ks_r4 Iwan Kawrakow 2025-07-09 10:39:44 +03:00
  • 234fab6fcb cuda: slightly faster MMQ for iq4_k, iq4_k_r4 Iwan Kawrakow 2025-07-09 10:26:29 +03:00
  • 89a94f978d cuda: slightly faster MMQ for iq3_k, iq3_k_r4 Iwan Kawrakow 2025-07-08 20:23:42 +03:00
  • 6a56d5075d Faster prompt processing for IQ2_KS, IQ2_K, IQ2_K_R4 (#593) Kawrakow 2025-07-08 19:44:48 +02:00
  • 97c34f4056 Faster prompt processing for IQ2_KS, IQ2_K, IQ2_K_R4 (#593) Kawrakow 2025-07-08 19:44:48 +02:00
  • a660980c43 Minor ik/cuda_faster_iq2k Iwan Kawrakow 2025-07-08 18:40:17 +03:00
  • 4a6d213446 Lookup is still beter for MMQ if we get 4 values at once Iwan Kawrakow 2025-07-08 18:30:34 +03:00
  • 75cc3d08e8 cuda: faster MMQ for iq2_ks, iq2_k, iq2_k_r4 Iwan Kawrakow 2025-07-08 16:38:43 +03:00
  • bcda159e04 Fix double header saood06 2025-07-08 04:09:23 -05:00
  • e47db933e3 Update README.md with tables saood06 2025-07-08 04:02:08 -05:00
  • 61f74b2f8a Export All (code only, no UI) Saood Karim 2025-07-08 02:18:40 -05:00
  • 6970ef925f CUDA: small PP performance improvement for MoE models (#589) Kawrakow 2025-07-07 07:23:12 +02:00
  • 4c0b660266 CUDA: small PP performance improvement for MoE models (#589) Kawrakow 2025-07-07 07:23:12 +02:00
  • 5763029653 Remove selectedSessionId and handle it with URL fragment Saood Karim 2025-07-06 21:15:28 -05:00
  • efc440fb29 Minor ik/cuda_quantized_fmoe Iwan Kawrakow 2025-07-06 15:58:49 +03:00
  • 6c55ffa8ef quantize_mmq_q8_1_id Iwan Kawrakow 2025-07-06 13:16:31 +03:00
  • 27ff5bf57e Special handling of Seed Coder FIM tokens (#585) Fizz~ 2025-07-06 06:13:55 -04:00
  • 6f3a3ba7e2 Special handling of Seed Coder FIM tokens (#585) Fizz~ 2025-07-06 06:13:55 -04:00
  • 49d4d2630a Fix server crash when there is no DRY sampler (#588) firecoperana 2025-07-06 00:51:36 -05:00
  • 22f67917f6 Fix server crash when there is no DRY sampler (#588) firecoperana 2025-07-06 00:51:36 -05:00
  • b931e8b831 This works, but is slower than the non-working version Iwan Kawrakow 2025-07-05 20:09:15 +03:00
  • 030ba3aebf Trying to implement quantized fmoe - not working yet Iwan Kawrakow 2025-07-05 19:10:56 +03:00
  • 2fddc45a02 Vulkan: flash attention for DeepSeek models (#584) Kawrakow 2025-07-05 15:14:12 +02:00
  • 4622fadc2a Vulkan: flash attention for DeepSeek models (#584) Kawrakow 2025-07-05 15:14:12 +02:00
  • dc8f072e9c Update sigma step and max based on docs Saood Karim 2025-07-05 02:55:38 -05:00
  • 64550a365e Add sigma sampler Saood Karim 2025-07-05 02:43:50 -05:00
  • a1cef1df18 Add useful error message when launched without sql file Saood Karim 2025-07-05 01:24:43 -05:00
  • 64e2d76896 Fix the FA cherry-pick ik/vulkan_fattn Iwan Kawrakow 2025-07-04 11:35:00 +03:00
  • d924318899 vulkan: support mixed/deepseekR1 FA head sizes (#14509) Jeff Bolz 2025-07-03 13:21:14 -05:00
  • b8784686e1 Adding forgotten file (#583) Kawrakow 2025-07-04 08:39:04 +02:00
  • 0678427f82 Adding forgotten file (#583) Kawrakow 2025-07-04 08:39:04 +02:00
  • 084bb8f8a7 Adding forgotten file ik/add_forgotten_multi_add Iwan Kawrakow 2025-07-04 09:36:03 +03:00
  • 28e81fc761 Vulkan: adding GGML_OP_MULTI_ADD implementation (#582) Kawrakow 2025-07-04 08:33:43 +02:00
  • 235c989e39 Vulkan: adding GGML_OP_MULTI_ADD implementation (#582) Kawrakow 2025-07-04 08:33:43 +02:00
  • d14d099641 Vulkan: adding GGML_OP_MULTI_ADD implementation ik/vulkan_multi_add Iwan Kawrakow 2025-07-04 09:08:24 +03:00
  • 93b7724bbb Vulkan: Disable multi-add for now (#581) Kawrakow 2025-07-03 18:31:48 +02:00
  • 3e024de1da Vulkan: Disable multi-add for now (#581) Kawrakow 2025-07-03 18:31:48 +02:00
  • 86f0654ef0 Vulkan: Disable multi-add for now ik/vulkan_disable_multi_add Iwan Kawrakow 2025-07-03 19:06:49 +03:00
  • 8d4f0a61db Vulkan: add GGML_OP_FUSED_MUL_UNARY (#580) Kawrakow 2025-07-03 18:03:23 +02:00
  • 8a0c38f496 Vulkan: add GGML_OP_FUSED_MUL_UNARY (#580) Kawrakow 2025-07-03 18:03:23 +02:00
  • e30c5d9841 Vulkan: add GGML_OP_FUSED_MUL_UNARY ik/vulkan_fused_mul_unary Iwan Kawrakow 2025-07-03 18:02:31 +03:00
  • b445c83eb9 Vulkan: fused rms norm (#577) Kawrakow 2025-07-03 15:36:52 +02:00
  • 9534461c01 Vulkan: fused rms norm (#577) Kawrakow 2025-07-03 15:36:52 +02:00
  • 1db6a073cb Do not crash when there is no DRY sampler (#578) Kawrakow 2025-07-03 15:26:52 +02:00
  • db8dee5051 Do not crash when there is no DRY sampler (#578) Kawrakow 2025-07-03 15:26:52 +02:00
  • 2fb0b26a8f Fix debug build failure with RPC off (#579) Kawrakow 2025-07-03 15:26:28 +02:00
  • ab22474d77 Fix debug build failure with RPC off (#579) Kawrakow 2025-07-03 15:26:28 +02:00
  • 6f4a804ad2 Fix debug build failure with RPC off ik/fix_rpc_off Iwan Kawrakow 2025-07-03 16:25:20 +03:00
  • fe9926708a Do not crash when there is no DRY sampler ik/fix_missing_dry Iwan Kawrakow 2025-07-03 15:43:19 +03:00
  • f9ea039f48 Vulkan: fused rms norm ik/vulkan_fused_rms Iwan Kawrakow 2025-07-03 15:23:27 +03:00
  • c482d14b12 Chnage KQ mask padding to 64 (#574) Kawrakow 2025-07-03 10:43:27 +02:00
  • de7a4403b0 Chnage KQ mask padding to 64 (#574) Kawrakow 2025-07-03 10:43:27 +02:00
  • f2bff69347 Chnage KQ mask padding to 64 ik/kq_mask_padding_64 Iwan Kawrakow 2025-07-03 11:26:28 +03:00
  • b5bc8dcde7 Fix to make it convert Saood Karim 2025-07-02 21:18:53 -05:00
  • eb88e8f4e7 Add python changes for dots1 support Saood Karim 2025-07-02 21:04:15 -05:00
  • c834b6d0b0 Add llama.cpp changes for dots1 support Saood Karim 2025-07-02 20:57:30 -05:00
  • 59e3f4ffe7 Fix CMakeLists (#571) Kawrakow 2025-07-02 16:11:56 +02:00
  • c9148ba0b4 Fix CMakeLists (#571) Kawrakow 2025-07-02 16:11:56 +02:00
  • a4a7334aad Minor ik/fix_vulkan_required Iwan Kawrakow 2025-07-02 17:08:58 +03:00
  • 7166c65083 Move Vulkan stuff inside if (GGML_VULKAN) Iwan Kawrakow 2025-07-02 17:06:05 +03:00
  • adc28f8852 Adding IQ3_KS quants (#566) Kawrakow 2025-07-02 09:27:47 +02:00
  • 3248a35992 Adding IQ3_KS quants (#566) Kawrakow 2025-07-02 09:27:47 +02:00
  • 59967f3d64 iq3_ks: Metal gemv - pathetic performance ik/iq3_ks_v2 Iwan Kawrakow 2025-07-01 12:50:57 +02:00
  • f7b3c07c92 iq3_ks: Metal dequantize Iwan Kawrakow 2025-07-01 11:51:30 +02:00
  • cb09eb5c19 iq3_ks: NEON convert to q8_k_r8 Iwan Kawrakow 2025-07-01 11:29:27 +02:00
  • 8fb48218cb iq3_ks: NEON GEMM/GEMV Iwan Kawrakow 2025-07-01 11:10:15 +02:00
  • 3e6341d72a iq3_ks: AVX2 GEMM/GEMV Iwan Kawrakow 2025-07-01 11:12:05 +03:00
  • 4fca652130 iq3_ks: AVX2 convert to q8_k_r8 Iwan Kawrakow 2025-07-01 10:03:16 +03:00
  • c421fa3012 iq3_ks: Zen4 Iwan Kawrakow 2025-07-01 09:18:19 +03:00
  • bc6a52815c iq3_ks: faster mmq Iwan Kawrakow 2025-06-30 19:53:01 +03:00
  • 05e456acd3 iq3_ks: mmq Iwan Kawrakow 2025-06-30 19:14:35 +03:00
  • a2a134673d iq3_ks: CUDA mmvq Iwan Kawrakow 2025-06-30 18:25:23 +03:00
  • 73c10b8243 iq3_ks: CUDA dequantize Iwan Kawrakow 2025-06-30 17:45:20 +03:00
  • 6740284ede iq3_ks: basics Iwan Kawrakow 2025-06-30 16:41:39 +03:00
  • 6215d9315c Minor CUDA PP speed improvement (#567) Kawrakow 2025-07-02 09:11:33 +02:00
  • 46f2e5d249 Minor CUDA PP speed improvement (#567) Kawrakow 2025-07-02 09:11:33 +02:00
  • 8a71405f5f Conditionally disable fused ops when building with Vulkan enabled (#569) Kawrakow 2025-07-02 08:59:04 +02:00
  • b2566759a9 Conditionally disable fused ops when building with Vulkan enabled (#569) Kawrakow 2025-07-02 08:59:04 +02:00
  • d9fd346cb6 Conditionally disable fused ops when building with Vulkan enabled ik/vulkan_disable_fused_ops Iwan Kawrakow 2025-07-02 09:56:58 +03:00
  • a3e78038d0 Merge vulkan code from mainline up to commit of 6/28/2025 (#563) firecoperana 2025-07-02 01:49:42 -05:00
  • d5cd99f9c8 Merge vulkan code from mainline up to commit of 6/28/2025 (#563) firecoperana 2025-07-02 01:49:42 -05:00