Commit Graph

  • d7008ad52d constexpr and minor changes ik/cuda_flash_mla3_v2 Iwan Kawrakow 2025-05-11 11:21:51 +03:00
  • d1601d463b Rearrange MLA K cache so it first new CUDA FA implementation Iwan Kawrakow 2025-05-11 10:48:26 +03:00
  • 130cdf2715 New DeepSeek FlashMLA Iwan Kawrakow 2025-05-11 09:58:03 +03:00
  • 0abcf0749e Fix race in the CUDA DeepSeek FA kernel (#406) Kawrakow 2025-05-11 08:12:47 +03:00
  • 36e6e888b7 Fix race in the CUDA DeepSeek FA kernel (#406) Kawrakow 2025-05-11 08:12:47 +03:00
  • 2f32589b8e Fix race in the CUDA DeepSeek FA kernel ik/fix_cuda_fa_race Iwan Kawrakow 2025-05-11 08:03:10 +03:00
  • 154a195f75 Minor ik/offload_policy Iwan Kawrakow 2025-05-10 19:07:02 +03:00
  • 3a671301f8 Adding GPU offload policy Iwan Kawrakow 2025-05-10 18:59:46 +03:00
  • a961f41762 TG improvements for MoE models (#404) Kawrakow 2025-05-10 18:52:54 +03:00
  • a2d24c97e5 TG improvements for MoE models (#404) Kawrakow 2025-05-10 18:52:54 +03:00
  • c4e1c2c905 CUDA: fix TG with SER ik/remove_unnessessary_ids_copy Iwan Kawrakow 2025-05-10 11:06:48 +03:00
  • b38112028e CPU: fix get_rows when SER is used Iwan Kawrakow 2025-05-10 10:18:33 +03:00
  • 10557832b1 cuda: Remove unnecessary device to host copy of row ids Iwan Kawrakow 2025-05-10 09:49:08 +03:00
  • 47fa8380c6 Handle incompatible DeepSeek GGUFs (#394) Kawrakow 2025-05-09 22:00:40 +03:00
  • 43a154d8b8 Handle incompatible DeepSeek GGUFs (#394) Kawrakow 2025-05-09 22:00:40 +03:00
  • a7e5b01540 Fix missing rope_freqs with convert_hf_to_gguf (#402) saood06 2025-05-09 09:17:41 -05:00
  • 967a2e1860 Fix missing rope_freqs with convert_hf_to_gguf (#402) saood06 2025-05-09 09:17:41 -05:00
  • caf309157b convert : adapt MiniCPM3 to separate rope_freqs insertion s6/rope_freq_fix Francis Couture-Harpin 2024-09-16 12:05:29 -04:00
  • 43b8bcf62d convert : refactor rope_freqs generation Francis Couture-Harpin 2024-09-08 20:01:13 -04:00
  • 2c6e01d66d lora : fix llama conversion script with ROPE_FREQS Xuan Son Nguyen 2024-08-21 14:53:54 +02:00
  • b64cb29713 Update README.md Kawrakow 2025-05-09 11:16:36 +03:00
  • e5a4a3ce78 Update README.md Kawrakow 2025-05-09 11:16:36 +03:00
  • dd2014a853 Fix CUDA FlashMLA-3 with quantized KV cache (#400) Kawrakow 2025-05-09 10:22:48 +03:00
  • 8777fc4855 Fix CUDA FlashMLA-3 with quantized KV cache (#400) Kawrakow 2025-05-09 10:22:48 +03:00
  • 957a6e7911 Update README.md Kawrakow 2025-05-09 10:13:25 +03:00
  • 496451a1d4 Update README.md Kawrakow 2025-05-09 10:13:25 +03:00
  • 87bfad8437 Support for Llama-3-Nemotron models (#377) saood06 2025-05-09 02:09:59 -05:00
  • bc6ae515ce Support for Llama-3-Nemotron models (#377) saood06 2025-05-09 02:09:59 -05:00
  • c5ed8f4069 Fix CUDA FlashMLA-3 with quantized KV cache ik/cuda_fix_quantized_flash_mla3 Iwan Kawrakow 2025-05-09 09:36:38 +03:00
  • 828758ec0d Update README.md Kawrakow 2025-05-07 18:59:01 +03:00
  • 4084ca7331 Update README.md Kawrakow 2025-05-07 18:59:01 +03:00
  • 2565b29f33 Handle incompatible DeepSeek GGUFs ik/handle_incompatible_deepseek_ggufs Iwan Kawrakow 2025-05-07 18:25:40 +03:00
  • 92ceda1d06 FlashMLA-3 for DeepSeek models on CUDA (#386) Kawrakow 2025-05-07 17:38:22 +03:00
  • 30536ee369 FlashMLA-3 for DeepSeek models on CUDA (#386) Kawrakow 2025-05-07 17:38:22 +03:00
  • 5436acdb6c fix some MSVC build problem. (#392) Gaolingx 2025-05-07 22:04:39 +08:00
  • 17c6fc6b73 fix some MSVC build problem. (#392) Gaolingx 2025-05-07 22:04:39 +08:00
  • 8a2d611083 Fix DeepSeek q8_0 cache (#391) Kawrakow 2025-05-07 12:06:49 +03:00
  • 8a5c0410e1 Fix DeepSeek q8_0 cache (#391) Kawrakow 2025-05-07 12:06:49 +03:00
  • 93d053f7ab Fix DeepSeek q8_0 cache ik/fix_deepseek_q80_cache Iwan Kawrakow 2025-05-07 12:02:05 +03:00
  • 6104bf5296 Fix build for Xeon Gold 6226R (#390) Kawrakow 2025-05-07 10:33:27 +03:00
  • 090eae4d69 Fix build for Xeon Gold 6226R (#390) Kawrakow 2025-05-07 10:33:27 +03:00
  • e6da985f02 Fix build for Xeon Gold 6226R ik/fix_xeon_6226R Iwan Kawrakow 2025-05-07 10:23:18 +03:00
  • 1982beb005 Minor tweak ik/cuda_flash_mla3 Iwan Kawrakow 2025-05-07 09:07:34 +03:00
  • 53e7e7790e Minor Iwan Kawrakow 2025-05-06 19:47:55 +03:00
  • 59a3e361a3 Also add these Iwan Kawrakow 2025-05-06 16:27:41 +03:00
  • c36fa20d2a Finalizing Iwan Kawrakow 2025-05-06 15:54:52 +03:00
  • 4edfc6712a Sadly, the previous commit was wrong Iwan Kawrakow 2025-05-06 15:05:05 +03:00
  • 0fee6c54d9 Much better Iwan Kawrakow 2025-05-06 12:59:05 +03:00
  • ed5990712d CUDA WIP: support for FlashMLA-3 Iwan Kawrakow 2025-05-06 09:32:02 +03:00
  • 6e7b28f7b0 Update README.md Kawrakow 2025-05-06 08:48:11 +03:00
  • 6c23618ca5 Update README.md Kawrakow 2025-05-06 08:48:11 +03:00
  • 296367a50d Update vocab.py s6/deci_support Iwan Kawrakow 2025-05-05 09:37:01 +03:00
  • b08471f717 Fix DeepSeek FA (#382) Kawrakow 2025-05-05 08:39:10 +03:00
  • e3fec17347 Fix DeepSeek FA (#382) Kawrakow 2025-05-05 08:39:10 +03:00
  • f455ead8aa Fix DeepSeek FA ik/fix_deepseek_fattn Iwan Kawrakow 2025-05-05 08:31:55 +03:00
  • 0e00121596 Better n_attention_vw rule Saood Karim 2025-05-04 10:35:48 -05:00
  • e508a3f194 Remove errant Granite mentions Saood Karim 2025-05-04 10:28:20 -05:00
  • 5e0fb61eb7 DeciLMCausalModel now reads rope_theta from config.json properly Yee Man Chan 2024-12-29 22:17:00 +08:00
  • a1a208a535 Untested support of 253B Saood Karim 2025-05-04 10:17:45 -05:00
  • 1eeeaefdc7 Changes to n_attention_wv rule Saood Karim 2025-05-04 08:02:16 -05:00
  • 8b3721312d Changes to make work and add longrope support Saood Karim 2025-05-04 06:42:43 -05:00
  • df638764fb conflict resolution Yee Man Chan 2024-12-19 10:41:32 +08:00
  • 45cd1bcd59 CUDA: MMQ for IQ4_KS (#374) Kawrakow 2025-05-04 12:45:00 +03:00
  • f7c9a0f036 CUDA: MMQ for IQ4_KS (#374) Kawrakow 2025-05-04 12:45:00 +03:00
  • db0ed280f1 Update README.md Kawrakow 2025-05-04 12:06:47 +03:00
  • 1328128298 Update README.md Kawrakow 2025-05-04 12:06:47 +03:00
  • 7cb99f8078 Update README.md Kawrakow 2025-05-04 11:49:29 +03:00
  • 7cb6a76cd0 Update README.md Kawrakow 2025-05-04 11:49:29 +03:00
  • a3975acd4c Add batch warmup to sweep-bench ik/sweep_bench_warmup Iwan Kawrakow 2025-05-04 11:21:19 +03:00
  • 3498ea4228 CUDA: MMQ for iq4_ks now works ik/cuda_mmq_iq4_ks Iwan Kawrakow 2025-05-04 08:58:18 +03:00
  • ca7b671e54 WIP: still getting illegal memory access Iwan Kawrakow 2025-05-04 08:49:30 +03:00
  • 1d249476d6 WIP Iwan Kawrakow 2025-05-03 20:04:29 +03:00
  • 711ba7e8f4 CUDA: faster FA TG for GQA models (#370) Kawrakow 2025-05-04 09:17:44 +03:00
  • ce2b0292e1 CUDA: faster FA TG for GQA models (#370) Kawrakow 2025-05-04 09:17:44 +03:00
  • fdbdb5310a Another attempt to fix #367 (#371) Kawrakow 2025-05-04 09:02:12 +03:00
  • b890e01238 Another attempt to fix #367 (#371) Kawrakow 2025-05-04 09:02:12 +03:00
  • 5782f1bdf0 Yet another ik/try_fix_367_v2 Iwan Kawrakow 2025-05-03 20:00:20 +03:00
  • 5605474193 Another attempt to fix #367 Iwan Kawrakow 2025-05-03 19:28:06 +03:00
  • 8db70379ae cmake: force MSVC compiler charset to utf-8 (#369) Gaolingx 2025-05-03 20:56:29 +08:00
  • ab7f694b71 cmake: force MSVC compiler charset to utf-8 (#369) Gaolingx 2025-05-03 20:56:29 +08:00
  • 056f08182a Use MMA for TG also when quantized ik/fattn_mma Iwan Kawrakow 2025-05-03 15:34:56 +03:00
  • 758ca617cd Trying to fix iq1_s_r4/iq1_m_r4 quantization failure (#368) Kawrakow 2025-05-03 14:43:55 +03:00
  • afcfa85756 Trying to fix iq1_s_r4/iq1_m_r4 quantization failure (#368) Kawrakow 2025-05-03 14:43:55 +03:00
  • 267a12aaa0 Trying to fix iq1_s_r4/iq1_m_r4 quantization failure ik/try_fix_367 Iwan Kawrakow 2025-05-03 13:53:39 +03:00
  • dd46438d25 cuda: WIP MMA FA Iwan Kawrakow 2025-05-03 13:29:25 +03:00
  • 892e96be53 Fix FA bug on AVX2 (#364) Kawrakow 2025-05-02 07:09:09 +02:00
  • 1ea1df4b2d Fix FA bug on AVX2 (#364) Kawrakow 2025-05-02 07:09:09 +02:00
  • aca68016d8 Fix model architecture name (#366) saood06 2025-05-02 00:07:24 -05:00
  • d37add8b39 Fix model architecture name (#366) saood06 2025-05-02 00:07:24 -05:00
  • 0e247afcac Fix model architecture name s6/bitnet_name_update junhuihe 2025-04-21 17:19:43 +08:00
  • 2b7061967a Also this was wrong ik/fix_fa_avx2_bug Iwan Kawrakow 2025-05-01 19:05:08 +03:00
  • a0d10704cd Dynamic Yarn s6/qwen3_dynamic_yarn Saood Karim 2025-05-01 07:24:02 -05:00
  • 1d177a69d4 Fix FA bug on AVX2 Iwan Kawrakow 2025-05-01 09:50:25 +03:00
  • 9303df7450 Update README.md (#352) Kawrakow 2025-04-30 15:11:29 +02:00
  • 98d1626469 Update README.md (#352) Kawrakow 2025-04-30 15:11:29 +02:00
  • 6c70182744 Updates ikawrakow-patch-1-1 Iwan Kawrakow 2025-04-30 16:10:45 +03:00
  • 1ea49001f3 Fix IQK_FA_ALL_QUANTS on AVX2 (#360) Kawrakow 2025-04-30 10:45:43 +02:00
  • 4c2bee0bed Fix IQK_FA_ALL_QUANTS on AVX2 (#360) Kawrakow 2025-04-30 10:45:43 +02:00
  • b05c85e487 Make it also work, not just compile ik/fix_358 Iwan Kawrakow 2025-04-30 11:45:07 +03:00
  • e8dc26d544 Fix IQK_FA_ALL_QUANTS on AVX2 Iwan Kawrakow 2025-04-30 11:30:11 +03:00