Commit Graph

  • 27e8ed6454 This seems very slightly better ik/cuda_swa2 Iwan Kawrakow 2025-09-02 18:24:45 +03:00
  • 32e223df46 Fix it Iwan Kawrakow 2025-09-02 14:35:05 +03:00
  • be2694eb68 Add n_swa to FA parameters Iwan Kawrakow 2025-09-02 14:19:22 +03:00
  • c2500dbb04 Bounds for flash attention Iwan Kawrakow 2025-09-02 13:01:11 +03:00
  • 727f7b7d9f Refactor CUDA flash attention (#745) Kawrakow 2025-09-02 10:12:56 +02:00
  • 3433c7b56d Refactor CUDA flash attention (#745) Kawrakow 2025-09-02 10:12:56 +02:00
  • d29c21ecbc Set default value of GGML_SCHED_MAX_COPIES to 1 (#751) Kawrakow 2025-09-02 07:04:39 +02:00
  • 1f4346381f Set default value of GGML_SCHED_MAX_COPIES to 1 (#751) Kawrakow 2025-09-02 07:04:39 +02:00
  • 8c67621b4b Set default value of GGML_SCHED_MAX_COPIES to 1 ik/sched_max_copies=1 Iwan Kawrakow 2025-09-02 08:01:42 +03:00
  • 56e0f897ae Revert "CUDA: prompt processing optimizations for MoE models (#739)" (#748) Kawrakow 2025-09-02 06:55:48 +02:00
  • 62f5382c2b Revert "CUDA: prompt processing optimizations for MoE models (#739)" (#748) Kawrakow 2025-09-02 06:55:48 +02:00
  • f8d511a30f Revert "CUDA: prompt processing optimizations for MoE models (#739)" ik/revert_739 Iwan Kawrakow 2025-09-01 20:06:57 +03:00
  • bb21114ab4 Slightly better PP ik/cuda_refactor_fattn Iwan Kawrakow 2025-09-01 09:17:19 +03:00
  • e2e5b270c5 Move mma launch to fattn-mma-f16.cuh Iwan Kawrakow 2025-08-23 16:29:39 +03:00
  • 86fd97406c Remove unnecessary includes from fattn.cu Iwan Kawrakow 2025-08-23 15:59:55 +03:00
  • f3b08d1347 Factor out vec Iwan Kawrakow 2025-08-23 15:55:02 +03:00
  • ba445d9ebc Factor out wmma Iwan Kawrakow 2025-08-23 15:42:55 +03:00
  • 3b0ace97a8 Factor out mma Iwan Kawrakow 2025-08-23 15:35:35 +03:00
  • cc73811ddc Remove double definition of LLAMA_LOG_DEBUG Kawrakow 2025-09-01 08:42:04 +03:00
  • d10d90ae27 Remove double definition of LLAMA_LOG_DEBUG Iwan Kawrakow 2025-09-01 08:42:04 +03:00
  • d7882c3cf8 Tool calls support from mainline (#723) firecoperana 2025-09-01 00:38:49 -05:00
  • 0f9ecaec04 Tool calls support from mainline (#723) firecoperana 2025-09-01 00:38:49 -05:00
  • 8de297b795 Fused FFN_UP+FFN_GATE op (#741) Kawrakow 2025-08-31 18:16:36 +03:00
  • b66cecca45 Fused FFN_UP+FFN_GATE op (#741) Kawrakow 2025-08-31 18:16:36 +03:00
  • 86e927bfe9 minor s6/fix_prompt_tokenization Saood Karim 2025-08-30 04:48:12 -05:00
  • 640f9b6485 This seems to work Saood Karim 2025-08-30 04:45:31 -05:00
  • 3bc7acf1bd Add command line option ik/fused_ffn_up_gate Iwan Kawrakow 2025-08-30 11:56:37 +03:00
  • df066ced5e Seems to be working on CUDA Iwan Kawrakow 2025-08-30 11:29:27 +03:00
  • 4360ff545e WIP CUDA Iwan Kawrakow 2025-08-30 11:16:48 +03:00
  • 06aacc3167 Fused up+gate+unary for regular (not MoE) FFN - CPU Iwan Kawrakow 2025-08-30 09:53:00 +03:00
  • d55e98519f CUDA: prompt processing optimizations for MoE models (#739) Kawrakow 2025-08-30 12:09:41 +03:00
  • f22a9ef95a CUDA: prompt processing optimizations for MoE models (#739) Kawrakow 2025-08-30 12:09:41 +03:00
  • 411606d73b Chat fixes ik/fix_missing_end Iwan Kawrakow 2025-08-30 08:21:55 +03:00
  • b0a1c63279 This is slightly better ik/skip_rowids_computation Iwan Kawrakow 2025-08-29 15:47:22 +03:00
  • 486f1adc1e Also this barely moves the needle Iwan Kawrakow 2025-08-28 18:20:18 +03:00
  • 37ef1d3719 Also this doesn't do much Iwan Kawrakow 2025-08-28 09:28:00 +03:00
  • 0ce8068d2b Skip the row id computation for the ffn_down op Iwan Kawrakow 2025-08-27 18:07:57 +03:00
  • f529c3a808 Sanitize imatrix (#735) Kawrakow 2025-08-29 09:08:15 +03:00
  • 46968d4ab1 Sanitize imatrix (#735) Kawrakow 2025-08-29 09:08:15 +03:00
  • 29be3e93c4 Make yarn_log_multiplier optional (#738) Kawrakow 2025-08-28 14:09:59 +03:00
  • 872ac10b02 Make yarn_log_multiplier optional (#738) Kawrakow 2025-08-28 14:09:59 +03:00
  • f5b3ca8c95 Make yarn_log_multiplier optional ik/optional_yarn_log_multiplier Iwan Kawrakow 2025-08-28 09:47:23 +03:00
  • aa340974f6 Add more checks for iq3_k, iq3_ks ik/sanitize_importance_iqk Iwan Kawrakow 2025-08-28 07:47:37 +03:00
  • e760b4dc41 Check for NaNs while loading the model. (#727) Kawrakow 2025-08-27 19:00:17 +03:00
  • dac5b48398 Check for NaNs while loading the model. (#727) Kawrakow 2025-08-27 19:00:17 +03:00
  • c9b50fd45c Minor Iwan Kawrakow 2025-08-27 16:57:54 +03:00
  • 756f3df7d3 sanitize imatrix: repacked i-quants Iwan Kawrakow 2025-08-27 15:57:10 +03:00
  • f220b83d7c sanitize imatrix: q2_k_r4, q4_k_r4, q5_k_r4, q6_k_r4 Iwan Kawrakow 2025-08-27 15:41:51 +03:00
  • b052addfca sanitize imatrix: iq4_xs_r8 and q3_k_r4 with a template Iwan Kawrakow 2025-08-27 15:28:45 +03:00
  • fcbf11e4a8 sanitize imatrix: iq4_xs_r8 Iwan Kawrakow 2025-08-27 15:01:55 +03:00
  • deb55fff63 sanitize imatrix: q6_0_r4 Iwan Kawrakow 2025-08-27 14:55:26 +03:00
  • c5de02cffa sanitize imatrix: q4_0_r8 Iwan Kawrakow 2025-08-27 14:42:46 +03:00
  • 3ab0e41793 sanitize imatrix: iq4_nl_r4 Iwan Kawrakow 2025-08-27 14:36:00 +03:00
  • 44fc0a5eb6 sanitize imatrix: iq5_ks Iwan Kawrakow 2025-08-27 14:24:35 +03:00
  • 63efad8978 sanitize imatrix: iq2_ks and iq2_kl Iwan Kawrakow 2025-08-27 14:19:27 +03:00
  • e935625f12 sanitize imatrix: iq4_kss Iwan Kawrakow 2025-08-27 14:07:08 +03:00
  • a6ecc679a2 sanitize imatrix: iq4_ks Iwan Kawrakow 2025-08-27 13:58:54 +03:00
  • b6f1bd68bb sanitize importance matrix: iq5_k, iq6_k Iwan Kawrakow 2025-08-27 13:52:23 +03:00
  • b41e8ef6d4 sanitize importance matrix: iq4_k Iwan Kawrakow 2025-08-27 13:35:07 +03:00
  • 6f4dd9c5d1 sanitize importance matrix: WIP Iwan Kawrakow 2025-08-27 13:24:09 +03:00
  • ca5b6ab9b1 Fix typo Kawrakow 2025-08-27 14:43:44 +03:00
  • 03ce6fdb0d Fix typo Iwan Kawrakow 2025-08-27 14:43:44 +03:00
  • 91d056209a Add checks for more quantizagtion types ik/validate_quants_on_load Iwan Kawrakow 2025-08-27 09:15:31 +03:00
  • 75d7ccadf7 Add checks for more quantization types Iwan Kawrakow 2025-08-27 09:02:18 +03:00
  • c04b918a01 Add command line option to validate quants Iwan Kawrakow 2025-08-27 08:31:59 +03:00
  • 3add753ed7 Also tell which experts have NaNs. Iwan Kawrakow 2025-08-25 11:46:30 +03:00
  • 3b94b641d6 Check for NaNs while loading the model. Iwan Kawrakow 2025-08-25 11:18:49 +03:00
  • 1dcc34f70a Heuristics for mmq_id -> original threshold (#734) Kawrakow 2025-08-27 08:17:41 +03:00
  • 966a6ce93c Heuristics for mmq_id -> original threshold (#734) Kawrakow 2025-08-27 08:17:41 +03:00
  • adee94976c Heuristics for mmq_id -> original threshold ik/mmq_id_thresh Iwan Kawrakow 2025-08-27 08:11:46 +03:00
  • 6afe9b48ab Sanitize importances for KT quantization (#720) Kawrakow 2025-08-27 08:04:15 +03:00
  • 931f04af53 Sanitize importances for KT quantization (#720) Kawrakow 2025-08-27 08:04:15 +03:00
  • 3dc4dffed5 Fix avx2 GEMM mess (v2) (#724) Kawrakow 2025-08-27 08:03:47 +03:00
  • 683d6f1fc8 Fix avx2 GEMM mess (v2) (#724) Kawrakow 2025-08-27 08:03:47 +03:00
  • 24fb00637e Adding forgotten q8_0_r8 to num_rows() ik/fix_avx2_gemm_mess Iwan Kawrakow 2025-08-27 08:00:14 +03:00
  • 7060acb4a0 Slightly more clear Iwan Kawrakow 2025-08-24 08:19:53 +03:00
  • 8b68a21aea This does it for iq4_nl on Zen4, but FA does not work Iwan Kawrakow 2025-08-23 19:45:59 +03:00
  • 8e30a22c80 This does it for iq4_nl, including FA Iwan Kawrakow 2025-08-23 19:07:17 +03:00
  • 5466311174 This fixes confusion around Q8_0 on AVX2 Iwan Kawrakow 2025-08-23 18:41:05 +03:00
  • ac4ec50f03 CUDA: muh faster prompt processing for MoE models and small u-batch sizes (#728) Kawrakow 2025-08-26 13:30:35 +03:00
  • 0cc32ff0b1 CUDA: muh faster prompt processing for MoE models and small u-batch sizes (#728) Kawrakow 2025-08-26 13:30:35 +03:00
  • c411d443ee Add CUDA fp8 header ik/add_mmq_id Iwan Kawrakow 2025-08-26 08:43:59 +03:00
  • 12ae77b8cd mmq_id: add iq3_kt, iq4_kt Iwan Kawrakow 2025-08-25 18:46:04 +03:00
  • 3d87c2d7b2 mmq_id: adding iq1_kt, iq2_kt Iwan Kawrakow 2025-08-25 18:09:55 +03:00
  • 00a0aad47d mmq_id: add iq1_s_r4 Iwan Kawrakow 2025-08-25 17:15:49 +03:00
  • a6fe757cd8 mmq_id: adding iq6_k Iwan Kawrakow 2025-08-25 17:03:16 +03:00
  • 601c6006d9 mmq_id: adding iq5_k, iq5_k_r4, q6_0 Iwan Kawrakow 2025-08-25 16:52:47 +03:00
  • 8f3c813ab0 mmq_id: adding iq5_ks, iq5_ks_r4 Iwan Kawrakow 2025-08-25 16:07:38 +03:00
  • 20ff716428 mmq_id: adding iq4_k, iq4_k_r4 Iwan Kawrakow 2025-08-25 15:48:29 +03:00
  • b0afe8dc20 mmq_id: add iq4_kss, iq4_ks, iq4_ks_r4 Iwan Kawrakow 2025-08-25 15:31:15 +03:00
  • 951971080d mmq_id: adding iq3_k, iq3_k_r4 Iwan Kawrakow 2025-08-25 15:03:58 +03:00
  • 66c31e17b3 mmq_id: add iq3_ks Iwan Kawrakow 2025-08-25 14:48:04 +03:00
  • 45497a0209 mmq_id: add iq2_kl Iwan Kawrakow 2025-08-25 14:36:49 +03:00
  • 4ac23588c1 mmq_id: add iq2_ks Iwan Kawrakow 2025-08-25 14:22:24 +03:00
  • d9114301c0 mmiq_id: don't assume row size is multiple of type size Iwan Kawrakow 2025-08-25 13:53:51 +03:00
  • 9031898cfd mmiq_id: don't assume row size is multiple of type size (per row scales) Iwan Kawrakow 2025-08-25 13:42:09 +03:00
  • cae0f4dfa4 Fix undefined template std::basic_string<char> (#726) Mohan Krishnan 2025-08-25 16:34:01 +08:00
  • 50f7119dfd Fix undefined template std::basic_string<char> (#726) Mohan Krishnan 2025-08-25 16:34:01 +08:00
  • 16e477a945 mmq_id: add iq2_k, iq2_k_r4 Iwan Kawrakow 2025-08-25 08:52:32 +03:00
  • e9afb0b8fc This works for mainline supported quants Iwan Kawrakow 2025-08-25 07:46:20 +03:00