Commit Graph

  • 634da2f0c9 Zen4: Faster PP for IQ2_KS, IQ4_KS, IQ5_KS (#428) Kawrakow 2025-05-17 10:42:33 +03:00
  • c35a383bcd Zen4: Faster PP for IQ2_KS, IQ4_KS, IQ5_KS (#428) Kawrakow 2025-05-17 10:42:33 +03:00
  • d7ebb3eae4 Zen4: faster PP for iq2_ks ik/zen4_faster_iq4ks_iq5ks Iwan Kawrakow 2025-05-17 10:22:38 +03:00
  • 2f557a0fd6 Zen4: faster PP for iq4_ks and iq5_ks Iwan Kawrakow 2025-05-17 09:48:11 +03:00
  • db111c91ee IQ5_KS_R4: row-interleaved IQ5_KS (#426) Kawrakow 2025-05-17 08:57:26 +03:00
  • 7abdf2b099 IQ5_KS_R4: row-interleaved IQ5_KS (#426) Kawrakow 2025-05-17 08:57:26 +03:00
  • 9dd452d4cb Fix iq5_ks on NEON ik/iq5_ks_r4 Iwan Kawrakow 2025-05-16 10:15:17 +03:00
  • 2881f5fd7c iq5_ks_r4: NEON Iwan Kawrakow 2025-05-16 08:49:21 +03:00
  • 2b040aef21 iq5_ks_r4: AVX2 works Iwan Kawrakow 2025-05-15 20:03:13 +03:00
  • 25ce0f0372 iq5_ks_r4: Zen4 works Iwan Kawrakow 2025-05-15 19:45:42 +03:00
  • 48ebde8b8b iq5_ks_r4: basics Iwan Kawrakow 2025-05-15 18:39:38 +03:00
  • e31ba05fcd Fix AVX2 implementation of IQ4_K, IQ4_KS, IQ5_K, IQ6_K (#427) Kawrakow 2025-05-16 17:25:15 +03:00
  • 134d548173 Fix AVX2 implementation of IQ4_K, IQ4_KS, IQ5_K, IQ6_K (#427) Kawrakow 2025-05-16 17:25:15 +03:00
  • 177dd173d6 Fix IQ6_K on AVX2 ik/fix_iq4k_avx2 Iwan Kawrakow 2025-05-16 16:49:26 +03:00
  • 2b6c050cca Fix IQ5_K on AVX2 Iwan Kawrakow 2025-05-16 16:14:34 +03:00
  • 1ddbcd3106 Fix IQ4_KS on AVX2 Iwan Kawrakow 2025-05-16 15:47:02 +03:00
  • 2fe2630b7d Fix IQ4_K on AVX2 Iwan Kawrakow 2025-05-16 14:57:28 +03:00
  • 06532ebd0e Adding forgotten template instance for iq5_ks (#424) Kawrakow 2025-05-15 16:50:15 +03:00
  • 34ae71c4d7 Adding forgotten template instance for iq5_ks (#424) Kawrakow 2025-05-15 16:50:15 +03:00
  • 349a697654 Adding forgotten template instance for iq5_ks ik/add_missing_mmq_iq5ks Iwan Kawrakow 2025-05-15 16:48:20 +03:00
  • 90e53a0b8b Adding IQ5_KS - 5.25 bpw quants (#422) Kawrakow 2025-05-15 16:02:39 +03:00
  • 3d92d7f802 Adding IQ5_KS - 5.25 bpw quants (#422) Kawrakow 2025-05-15 16:02:39 +03:00
  • a7ceba3dc6 iq5_ks: Metal dot product ik/iq5_ks Iwan Kawrakow 2025-05-15 15:48:29 +03:00
  • cf93e69f0f iq5_ks: Metal dequantize Iwan Kawrakow 2025-05-15 15:10:58 +03:00
  • b8db611a92 iq5_ks: NEON Iwan Kawrakow 2025-05-15 13:56:00 +03:00
  • e2ecb1a0a3 iq5_ks: AVX2 Iwan Kawrakow 2025-05-15 13:35:36 +03:00
  • 65b9d3302e iq5_ks: Zen4 Iwan Kawrakow 2025-05-15 12:34:20 +03:00
  • f0355f2522 iq5_ks: MMQ works Iwan Kawrakow 2025-05-15 11:44:32 +03:00
  • 31ecbaa478 iq5_ks: dot product works on CUDA Iwan Kawrakow 2025-05-15 10:42:07 +03:00
  • ecfbaba74b iq5_ks: CUDA dequantize works Iwan Kawrakow 2025-05-15 10:22:18 +03:00
  • d6eb80d9ee iq5_ks: quantize Iwan Kawrakow 2025-05-15 09:44:37 +03:00
  • 560820cdb4 iq5_ks: basics Iwan Kawrakow 2025-05-15 09:38:23 +03:00
  • 17d721820a Fix standard attention on the CPU (#421) Kawrakow 2025-05-15 08:43:39 +03:00
  • 3f8c865b92 Fix standard attention on the CPU (#421) Kawrakow 2025-05-15 08:43:39 +03:00
  • ab6077718f Fix standard attention on the CPU ik/fix_standard_attention_cpu Iwan Kawrakow 2025-05-15 08:40:47 +03:00
  • 5e31a7df43 CUDA: quantized GEMM for for IQ2_KS, IQ2_K, IQ3_K (#418) Kawrakow 2025-05-15 08:15:08 +03:00
  • 14ed9fb44d CUDA: quantized GEMM for for IQ2_KS, IQ2_K, IQ3_K (#418) Kawrakow 2025-05-15 08:15:08 +03:00
  • 217905c8b3 Fix iq2_ks ik/cuda_mmq_iq2_k Iwan Kawrakow 2025-05-14 19:03:54 +03:00
  • f069a57817 MMQ for iq2_ks Iwan Kawrakow 2025-05-14 18:41:42 +03:00
  • 1aceccae35 MMQ for iq3_k Iwan Kawrakow 2025-05-14 17:06:27 +03:00
  • b44eaaa145 This works Iwan Kawrakow 2025-05-14 16:09:15 +03:00
  • 92b765dc64 MMQ for iq2_k Iwan Kawrakow 2025-05-14 15:47:37 +03:00
  • 51db1bf2d2 CUDA: quantized GEMM for for IQ4_K, IQ5_K, IQ6_K (#417) Kawrakow 2025-05-14 14:04:11 +03:00
  • 0435b68e6d CUDA: quantized GEMM for for IQ4_K, IQ5_K, IQ6_K (#417) Kawrakow 2025-05-14 14:04:11 +03:00
  • d91316e475 MMQ for iq6_k ik/cuda_mmq_iq4_k Iwan Kawrakow 2025-05-14 12:34:22 +03:00
  • 0bdbf33184 MMQ for iq5_k: slightly faster Iwan Kawrakow 2025-05-14 12:07:32 +03:00
  • 775b9091cb Cleanup Iwan Kawrakow 2025-05-14 11:53:11 +03:00
  • 7ec38dab8a MMQ for iq5_k Iwan Kawrakow 2025-05-14 11:28:01 +03:00
  • 5376413185 MMQ for iq4_k: working now Iwan Kawrakow 2025-05-14 08:16:57 +03:00
  • f7802849b4 MMQ for iq4_k: WIP (not working) Iwan Kawrakow 2025-05-09 18:28:24 +03:00
  • fba62d61c0 Fix SER (CUDA) (#416) Kawrakow 2025-05-14 07:29:28 +03:00
  • b90d6ede2e Fix SER (CUDA) (#416) Kawrakow 2025-05-14 07:29:28 +03:00
  • 79bdbbb3c0 This seems to work ik/fix_ser_cuda Iwan Kawrakow 2025-05-13 20:01:54 +03:00
  • e78e7f5e6e This seems to fix it. Iwan Kawrakow 2025-05-13 17:10:05 +03:00
  • 564cbbaa7a Cleanup Iwan Kawrakow 2025-05-13 15:04:37 +03:00
  • c5b914ad4f Fixing SER bugs Iwan Kawrakow 2025-05-13 14:55:57 +03:00
  • d002b9b4a0 Fix SER (CPU) (#415) Kawrakow 2025-05-13 17:55:04 +03:00
  • 13740622e9 Fix SER (CPU) (#415) Kawrakow 2025-05-13 17:55:04 +03:00
  • 4071472bdc Fix imatrix calculation for MLA models (#411) Kawrakow 2025-05-13 17:53:38 +03:00
  • 0c57f84dc4 Fix imatrix calculation for MLA models (#411) Kawrakow 2025-05-13 17:53:38 +03:00
  • 86dbdea6fc Better CPU FA performance for DeepSeek-Lite (#410) Kawrakow 2025-05-13 17:53:20 +03:00
  • 553c08b6b4 Better CPU FA performance for DeepSeek-Lite (#410) Kawrakow 2025-05-13 17:53:20 +03:00
  • 2c18ef1400 Cleanup ik/fix_ser Iwan Kawrakow 2025-05-13 15:04:37 +03:00
  • 10ae78e08a Fixing SER bugs Iwan Kawrakow 2025-05-13 14:55:57 +03:00
  • 537f72f9cc Update README.md Kawrakow 2025-05-12 15:48:37 +03:00
  • 4ba6bbb44a Update README.md Kawrakow 2025-05-12 15:48:37 +03:00
  • be1d5c4b7e Fix new CUDA FA on Touring (#413) Kawrakow 2025-05-12 15:09:33 +03:00
  • 627f406437 Fix new CUDA FA on Touring (#413) Kawrakow 2025-05-12 15:09:33 +03:00
  • d2362176df Fix new CUDA FA on Touring ik/fix_412 Iwan Kawrakow 2025-05-12 15:01:35 +03:00
  • 902024a64c Fix imatrix calculation for MLA models ik/fix_mla_imatrix Iwan Kawrakow 2025-05-12 13:20:22 +03:00
  • 83dab6a7ce It must be like this ik/cpu_deepseek_fa Iwan Kawrakow 2025-05-12 10:19:28 +03:00
  • 7b120637d9 Better CPU FA performance for DeepSeek-Lite Iwan Kawrakow 2025-05-12 09:25:48 +03:00
  • ceb8f513e4 Add batch warmup to sweep-bench (#375) Kawrakow 2025-05-12 07:50:26 +03:00
  • 1d2da7feae Add batch warmup to sweep-bench (#375) Kawrakow 2025-05-12 07:50:26 +03:00
  • 2e585d4508 Enable faster prompt processing with mainline llama.cpp GGUFs (#409) Kawrakow 2025-05-12 07:49:51 +03:00
  • f27cd40542 Enable faster prompt processing with mainline llama.cpp GGUFs (#409) Kawrakow 2025-05-12 07:49:51 +03:00
  • 0c02e16a39 Faster DeepSeek FA on CUDA (#408) Kawrakow 2025-05-12 07:49:00 +03:00
  • 465569dff8 Faster DeepSeek FA on CUDA (#408) Kawrakow 2025-05-12 07:49:00 +03:00
  • aa8ec5dfa6 GPU offload policy (#405) Kawrakow 2025-05-12 07:47:46 +03:00
  • 8669c3db2b GPU offload policy (#405) Kawrakow 2025-05-12 07:47:46 +03:00
  • 999d991152 Add newly created tensors to model.tensors_by_name ik/enable_mla3_in_crippled_ggufs Iwan Kawrakow 2025-05-11 18:03:22 +03:00
  • bf12612941 Enable MLA-3 in crippled GGUFs: seems to work Iwan Kawrakow 2025-05-11 16:49:40 +03:00
  • 8ee5008f7e Enable MLA-3 in crippled GGUFs: WIP Iwan Kawrakow 2025-05-11 14:18:24 +03:00
  • 8f7bd74afb Revert "Fix race in the CUDA DeepSeek FA kernel (#406)" Kawrakow 2025-05-11 12:22:19 +03:00
  • 504fb890d9 Revert "Fix race in the CUDA DeepSeek FA kernel (#406)" Iwan Kawrakow 2025-05-11 12:22:19 +03:00
  • d7008ad52d constexpr and minor changes ik/cuda_flash_mla3_v2 Iwan Kawrakow 2025-05-11 11:21:51 +03:00
  • d1601d463b Rearrange MLA K cache so it first new CUDA FA implementation Iwan Kawrakow 2025-05-11 10:48:26 +03:00
  • 130cdf2715 New DeepSeek FlashMLA Iwan Kawrakow 2025-05-11 09:58:03 +03:00
  • 0abcf0749e Fix race in the CUDA DeepSeek FA kernel (#406) Kawrakow 2025-05-11 08:12:47 +03:00
  • 36e6e888b7 Fix race in the CUDA DeepSeek FA kernel (#406) Kawrakow 2025-05-11 08:12:47 +03:00
  • 2f32589b8e Fix race in the CUDA DeepSeek FA kernel ik/fix_cuda_fa_race Iwan Kawrakow 2025-05-11 08:03:10 +03:00
  • 154a195f75 Minor ik/offload_policy Iwan Kawrakow 2025-05-10 19:07:02 +03:00
  • 3a671301f8 Adding GPU offload policy Iwan Kawrakow 2025-05-10 18:59:46 +03:00
  • a961f41762 TG improvements for MoE models (#404) Kawrakow 2025-05-10 18:52:54 +03:00
  • a2d24c97e5 TG improvements for MoE models (#404) Kawrakow 2025-05-10 18:52:54 +03:00
  • c4e1c2c905 CUDA: fix TG with SER ik/remove_unnessessary_ids_copy Iwan Kawrakow 2025-05-10 11:06:48 +03:00
  • b38112028e CPU: fix get_rows when SER is used Iwan Kawrakow 2025-05-10 10:18:33 +03:00
  • 10557832b1 cuda: Remove unnecessary device to host copy of row ids Iwan Kawrakow 2025-05-10 09:49:08 +03:00
  • 47fa8380c6 Handle incompatible DeepSeek GGUFs (#394) Kawrakow 2025-05-09 22:00:40 +03:00
  • 43a154d8b8 Handle incompatible DeepSeek GGUFs (#394) Kawrakow 2025-05-09 22:00:40 +03:00