Commit Graph

  • fb6a0d0184 iq1_s_r4: MMQ on CUDA ik/cuda_iq1_s_r4 Iwan Kawrakow 2025-06-04 15:11:17 +03:00
  • 33ced81cdf iq1_s_r4: CUDA GEMV Iwan Kawrakow 2025-06-04 12:17:34 +03:00
  • d34f72a567 iq1_s_r4: CUDA dequantize Iwan Kawrakow 2025-06-04 10:45:31 +03:00
  • 1d28b2a9a1 Adding top-n-sigma sampler (#489) Kawrakow 2025-06-03 17:35:09 +03:00
  • f6d5fbdc57 Adding top-n-sigma sampler (#489) Kawrakow 2025-06-03 17:35:09 +03:00
  • 106e326993 More README ik/sampling-top-n-sigma Iwan Kawrakow 2025-06-03 15:16:28 +03:00
  • dc8b4c82ba More README Iwan Kawrakow 2025-06-03 14:59:21 +03:00
  • f75ee6d4a7 Update README.md for main and server Iwan Kawrakow 2025-06-03 13:17:51 +03:00
  • 63c6f15599 Fix typos in XTC PR Iwan Kawrakow 2025-06-03 12:47:43 +03:00
  • 1ce02ebc01 Adding top-n-sigma sampler Iwan Kawrakow 2025-06-03 12:23:32 +03:00
  • accf69b126 Adding the XTC sampler (#486) Kawrakow 2025-06-03 11:32:03 +03:00
  • ccb265c016 Adding the XTC sampler (#486) Kawrakow 2025-06-03 11:32:03 +03:00
  • 62d5e5365b Also do the dequantize approach for iqk_moe_fused_up_gate ik/dequant_moe_gemm Iwan Kawrakow 2025-06-03 11:11:46 +03:00
  • feccbe0b9d Also do the dequantize approach for mul_mat_id Iwan Kawrakow 2025-06-03 10:50:09 +03:00
  • 7c5d9aba86 convert_hf_to_gguf.py : conversion from hf weights to Q6_0 (#483) Nexes the Elder 2025-06-03 08:30:30 +02:00
  • 4f8b05a0d7 convert_hf_to_gguf.py : conversion from hf weights to Q6_0 (#483) Nexes the Elder 2025-06-03 08:30:30 +02:00
  • 626f49ab84 Check if MMVQ is supported before using it. ik/mmvq_type_supported Iwan Kawrakow 2025-06-03 09:16:53 +03:00
  • d4b1a7f9c5 Adding the XTC sampler ik/sampling-xtc Iwan Kawrakow 2025-06-03 08:55:02 +03:00
  • 061d064b21 If available, use bf16 for iq4_kt gemm/gemv ik/trellis_bf16 Iwan Kawrakow 2025-06-02 11:59:20 +03:00
  • 0715919fc0 BF16 for iq4_kt Iwan Kawrakow 2025-06-02 11:17:18 +03:00
  • 9890618db4 If available, use bf16 for iq3_kt gemm/gemv Iwan Kawrakow 2025-06-02 07:14:54 +03:00
  • 62d8dd932b If available, use bf16 for iq2_kt gemm/gemv Iwan Kawrakow 2025-06-01 18:07:10 +03:00
  • 67b7ed49a4 Minor (~2%) iq2_ks TG performance improvement on CUDA (#468) Kawrakow 2025-06-01 15:24:33 +03:00
  • 7a8abe29f7 Minor (~2%) iq2_ks TG performance improvement on CUDA (#468) Kawrakow 2025-06-01 15:24:33 +03:00
  • b0bae0f0fa Trellis quants: faster CPU prompt processing (#482) Kawrakow 2025-06-01 15:24:05 +03:00
  • 3df1a3a44d Trellis quants: faster CPU prompt processing (#482) Kawrakow 2025-06-01 15:24:05 +03:00
  • 8d8f32b994 Metal implementatio for the trellis quants. (#475) Kawrakow 2025-06-01 15:23:44 +03:00
  • 35374bc7e8 Metal implementatio for the trellis quants. (#475) Kawrakow 2025-06-01 15:23:44 +03:00
  • a7fa24a6c5 Disable iq4_kt on Metal for now ik/trellis_metal Iwan Kawrakow 2025-06-01 15:21:19 +03:00
  • 0ae9a5450d F16 repacking attempt - slower on AVX2 ik/repack_f16 Iwan Kawrakow 2025-06-01 11:18:02 +03:00
  • d14eb93fb6 iq4_kt: Metal still not working Iwan Kawrakow 2025-06-01 08:06:11 +03:00
  • 079753abd7 Minor ik/dequant_gemm Iwan Kawrakow 2025-06-01 07:24:21 +03:00
  • 1a35af7251 Experimenting with dequant + f16 GEMM on NEON Iwan Kawrakow 2025-05-31 16:47:38 +03:00
  • cd4266eb58 Experimenting with dequant + f16 GEMM on NEON Iwan Kawrakow 2025-05-31 16:10:33 +03:00
  • 310b585af8 Experimenting with dequant + f32 GEMM Iwan Kawrakow 2025-05-31 11:46:05 +03:00
  • a7fb0fc3cc Experimenting with dequant + f32 GEMM Iwan Kawrakow 2025-05-31 11:08:27 +03:00
  • 63a8b2260e forgotten refs and typo (#478) Nexes the Elder 2025-05-31 06:36:50 +02:00
  • 7239ce6b35 forgotten refs and typo (#478) Nexes the Elder 2025-05-31 06:36:50 +02:00
  • b687b2b62e iq4_kt: Metal GEMV - also not working Iwan Kawrakow 2025-05-30 18:45:31 +03:00
  • 07663b2cf1 iq4_kt: Metal dequantize - getting NaNs Iwan Kawrakow 2025-05-30 17:53:09 +03:00
  • ad52554a5e iq3_kt: Metal GEMV Iwan Kawrakow 2025-05-30 13:21:25 +03:00
  • 2396cc3f88 iq3_kt: Metal dequantize Iwan Kawrakow 2025-05-30 12:32:55 +03:00
  • eeeca319dd iq2_kt: Metal GEMV Iwan Kawrakow 2025-05-30 11:39:59 +03:00
  • b3d223911c Replace MLA-specific KV cache with the standard KV cache (#469) Kawrakow 2025-05-30 11:08:17 +03:00
  • 2cf12eb12d Replace MLA-specific KV cache with the standard KV cache (#469) Kawrakow 2025-05-30 11:08:17 +03:00
  • df257a07e6 Replace MLA-specific KV cache with the standard KV cache V2 (#473) ik/remove_kv_l saood06 2025-05-30 02:28:27 -05:00
  • ae3816e13d Fix double print s6/remove_kv_l Saood Karim 2025-05-30 02:06:29 -05:00
  • 31c20e2c2d Fix save and restore when there is no V cache Saood Karim 2025-05-30 01:30:50 -05:00
  • 983844e95e iq2_kt: Metal dequantize Iwan Kawrakow 2025-05-30 07:52:09 +03:00
  • 0dea2e8a81 NEON implementation for trellis quants (#471) Kawrakow 2025-05-29 18:57:41 +03:00
  • 1eac9e8487 NEON implementation for trellis quants (#471) Kawrakow 2025-05-29 18:57:41 +03:00
  • 17dcd4dc89 iq4_kt: slightly faster TG on NEON ik/trellis_neon Iwan Kawrakow 2025-05-29 17:07:42 +03:00
  • 7a783af1ad Cleanup Iwan Kawrakow 2025-05-29 16:28:17 +03:00
  • cc395cf879 iq4_kt: NEON implementation Iwan Kawrakow 2025-05-29 15:30:15 +03:00
  • 1a203fdbc5 Send [DONE] for OAI compatibility ik/server_send_done Iwan Kawrakow 2025-05-29 07:33:05 +03:00
  • ac27355e3b Hopefully take care of missing V cache (MLA) Iwan Kawrakow 2025-05-28 14:16:01 +03:00
  • edd049b0d3 Remove kv_l, kvt_l and just use k_l and v_l Iwan Kawrakow 2025-05-28 13:43:42 +03:00
  • 9b97acd500 Minor (~2%) iq2_ks TG performance improvement on CUDA ik/minor_iq2ks_tweak Iwan Kawrakow 2025-05-28 13:17:18 +03:00
  • ccc00a4a56 set cache_prompt default to true (#465) saood06 2025-05-28 00:18:25 -05:00
  • ccd6d9cdf6 set cache_prompt default to true (#465) saood06 2025-05-28 00:18:25 -05:00
  • b033ca894b set cache_prompt default to true s6/cache_default Saood Karim 2025-05-27 19:39:02 -05:00
  • 6989ca0249 CUDA GEMM and GEMV for IQ4_KS_R4 and IQ5_KS_R4 (#462) Kawrakow 2025-05-27 08:37:44 +03:00
  • 0976467845 CUDA GEMM and GEMV for IQ4_KS_R4 and IQ5_KS_R4 (#462) Kawrakow 2025-05-27 08:37:44 +03:00
  • 64c754ba8b CUDA: iq5_ks_r4 GEMV and GEMM ik/cuda_iqk_ks_r4 Iwan Kawrakow 2025-05-26 19:30:31 +03:00
  • f0efb1f52a CUDA: iq4_ks_r4 GEMV and GEMM Iwan Kawrakow 2025-05-26 18:35:03 +03:00
  • 89728ab03c CUDA implementation for IQ2_K_R4, IQ3_K_R4, IQ4_K_R4, IQ5_K_R4 (#461) Kawrakow 2025-05-26 19:34:54 +03:00
  • 1429291326 CUDA implementation for IQ2_K_R4, IQ3_K_R4, IQ4_K_R4, IQ5_K_R4 (#461) Kawrakow 2025-05-26 19:34:54 +03:00
  • 1a8145e48e CUDA: faster iq2_k_r4 GEMV ik/cuda_iq4_k_r4 Iwan Kawrakow 2025-05-26 16:29:36 +03:00
  • fa011c9017 CUDA: iq2_k_r4 GEMV Iwan Kawrakow 2025-05-26 16:13:54 +03:00
  • aaf6d34789 CUDA: slightly faster iq3_k_r4 GEMV Iwan Kawrakow 2025-05-26 15:05:02 +03:00
  • 1faa7d977b CUDA: iq3_k_r4 GEMV Iwan Kawrakow 2025-05-26 14:49:58 +03:00
  • 395dc935fc CUDA: iq3_k_r4 dequantize Iwan Kawrakow 2025-05-26 14:12:03 +03:00
  • 4af604288d CUDA: iq5_k_r4 GEMV Iwan Kawrakow 2025-05-26 13:10:49 +03:00
  • 15adb7e0a7 CUDA: iq5_k_r4 dequantize Iwan Kawrakow 2025-05-26 12:42:47 +03:00
  • 5ac189d465 CUDA: slightly faster iq4_k_r4 GEMV Iwan Kawrakow 2025-05-26 11:31:54 +03:00
  • c6b711c8b0 CUDA: slightly faster iq4_k_r4 GEMV Iwan Kawrakow 2025-05-26 10:52:47 +03:00
  • 1bbd526a9c CUDA: iq4_k_r4 GEMV Iwan Kawrakow 2025-05-26 10:17:00 +03:00
  • 7d3332c6b9 CUDA: iq4_k_r4 dequantize Iwan Kawrakow 2025-05-26 08:23:17 +03:00
  • 639aee23c5 Add missing gguf-py constants (#458) Kawrakow 2025-05-25 09:55:36 +03:00
  • 24c010b391 Add missing gguf-py constants (#458) Kawrakow 2025-05-25 09:55:36 +03:00
  • 60043847c2 Add missing gguf-py constants ik/add_missing_gguf_constants Iwan Kawrakow 2025-05-25 09:53:53 +03:00
  • 465fe3b78d iq4_kt: not working NEON implementation Iwan Kawrakow 2025-05-25 08:58:29 +03:00
  • ceed25bf8a Remove GGML_IQK_MUL_MAT option ik/remove_iqk_option Iwan Kawrakow 2025-05-25 08:18:42 +03:00
  • e0fedaeb07 iq3_kt: NEON implementation Iwan Kawrakow 2025-05-24 18:48:51 +03:00
  • 5e684c1616 iq2_kt: NEON implementation Iwan Kawrakow 2025-05-24 18:25:24 +03:00
  • dad5464e34 Merge remote-tracking branch 'origin/main' into s6/fp8_native s6/fp8_native Saood Karim 2025-05-24 04:45:19 -05:00
  • 7486601f0a remove print Saood Karim 2025-05-24 03:49:43 -05:00
  • 86170b2048 Legacy quants conversion schemes in convert_hf_to_gguf.py (#449) Nexes the Elder 2025-05-24 10:49:10 +02:00
  • c7ecd4e23a Legacy quants conversion schemes in convert_hf_to_gguf.py (#449) Nexes the Elder 2025-05-24 10:49:10 +02:00
  • 82645c4be7 Faster IQ3_KT and IQ4_KT (#453) Kawrakow 2025-05-24 11:48:52 +03:00
  • a2c42f9985 Faster IQ3_KT and IQ4_KT (#453) Kawrakow 2025-05-24 11:48:52 +03:00
  • 16597a3ee2 Add fp8 GGUF creation Saood Karim 2025-05-24 03:48:06 -05:00
  • 3fe6c0a6e1 Very slightly faster iq4_kt TG ik/opt_kt_quants Iwan Kawrakow 2025-05-24 08:08:32 +03:00
  • 5929fafbed Cleanup Iwan Kawrakow 2025-05-24 07:53:33 +03:00
  • fb254f0c97 Slightly faster iq4_kt Iwan Kawrakow 2025-05-23 20:02:42 +03:00
  • 2994447021 Fix bug in MMVQ kernel (#446) Kawrakow 2025-05-23 18:25:11 +03:00
  • 9fb82af3a8 Fix bug in MMVQ kernel (#446) Kawrakow 2025-05-23 18:25:11 +03:00
  • 2440eca319 Fix MSVC compilation (#448) Kawrakow 2025-05-23 16:46:27 +03:00
  • 6b12c2e7e8 Fix MSVC compilation (#448) Kawrakow 2025-05-23 16:46:27 +03:00
  • 858f2a55a5 Arghhh ik/fix_447 Iwan Kawrakow 2025-05-23 16:26:03 +03:00