Commit Graph

  • a2f3b08fbd merge_qkv: qwen3 (dense) Iwan Kawrakow 2025-10-29 15:59:38 +02:00
  • ed31b6741f merge_qkv: llama-4 Iwan Kawrakow 2025-10-29 15:44:24 +02:00
  • 5c138c4d08 merge_qkv: fix tensor dimensions Iwan Kawrakow 2025-10-29 13:57:33 +02:00
  • 4e8f371e76 merge_qkv: add command loine argument to enable Iwan Kawrakow 2025-10-29 13:27:55 +02:00
  • ca5cff8677 merge_qkv: glm4.5moe Iwan Kawrakow 2025-10-29 12:59:05 +02:00
  • 18765a4907 merge_qkv: bias can be required, optional, or mandatory Iwan Kawrakow 2025-10-29 11:26:04 +02:00
  • 6c53a97122 Don't ignore the return value of create_tensors() Iwan Kawrakow 2025-10-29 11:15:20 +02:00
  • 2b3af4addc WIP Iwan Kawrakow 2025-10-29 10:52:07 +02:00
  • c699846aa6 merge_qkv: it works for gpt-oss Iwan Kawrakow 2025-10-29 10:42:37 +02:00
  • 446b4a4da3 WIP Iwan Kawrakow 2025-10-29 09:45:30 +02:00
  • d73914c70b POC: merge Q, K, V into a single, contiguous tensor Iwan Kawrakow 2025-10-29 09:17:51 +02:00
  • 0459f595d7 CUDA: corectly detect if flash attention is supported (#875) Kawrakow 2025-10-29 13:56:16 +02:00
  • c33f39d58f CUDA: corectly detect if flash attention is supported (#875) Kawrakow 2025-10-29 13:56:16 +02:00
  • d0992d6e1f Fix device parsing bug Kawrakow 2025-10-29 08:28:10 +02:00
  • 9a651e8476 Fix device parsing bug Iwan Kawrakow 2025-10-29 08:28:10 +02:00
  • d50c2490fc correct typo (#876) Nexes the Elder 2025-10-28 18:01:45 +01:00
  • 0ba5424fbf correct typo (#876) Nexes the Elder 2025-10-28 18:01:45 +01:00
  • e2b7da9684 Minor ik/fattn_is_supported Iwan Kawrakow 2025-10-28 15:24:08 +02:00
  • 81074219e1 Also wmma Iwan Kawrakow 2025-10-28 15:21:25 +02:00
  • b8fbfe5487 Correctly determine if FA is supported Iwan Kawrakow 2025-10-28 12:29:50 +02:00
  • 3f6b6980b1 Don't use vector kernels if K or V are quantized Iwan Kawrakow 2025-10-28 11:09:42 +02:00
  • e68dabc242 A few server commits from mainline. (#872) Nexes the Elder 2025-10-28 08:58:31 +01:00
  • e672bc59dd A few server commits from mainline. (#872) Nexes the Elder 2025-10-28 08:58:31 +01:00
  • 0a80135392 Fix warnings about LLAMA_DEBUG being redefined Kawrakow 2025-10-27 18:41:03 +02:00
  • 65763a2a70 Fix warnings about LLAMA_DEBUG being redefined Iwan Kawrakow 2025-10-27 18:41:03 +02:00
  • 904e994bfb Support --device and --device-draft parameter (#866) firecoperana 2025-10-27 16:13:28 +00:00
  • 6dc5bd847b Support --device and --device-draft parameter (#866) firecoperana 2025-10-27 16:13:28 +00:00
  • eb8116b097 Even more fused ops (#868) Kawrakow 2025-10-27 16:09:01 +02:00
  • bdf4f0ddce Even more fused ops (#868) Kawrakow 2025-10-27 16:09:01 +02:00
  • bf991ba60a Add --webui arg to launch llama.cpp new webui (#786) firecoperana 2025-10-27 12:22:02 +00:00
  • d894998fa6 Add --webui arg to launch llama.cpp new webui (#786) firecoperana 2025-10-27 12:22:02 +00:00
  • 40eabe5dd8 fix bugs (#870) firecoperana 2025-10-27 12:17:48 +00:00
  • 6848a0aac6 fix bugs (#870) firecoperana 2025-10-27 12:17:48 +00:00
  • 1f14f50dfd Try removing copy indirection ik/try_remove_cpy_indirection Iwan Kawrakow 2025-10-27 11:39:18 +02:00
  • 444782523d Make sure the bias really is 1 row to use fusion ik/fuse_biased_qkv Iwan Kawrakow 2025-10-27 07:10:03 +02:00
  • 1cf4f21463 Cleanup Iwan Kawrakow 2025-10-26 18:59:38 +02:00
  • 21a37bfc6c Faster copy when tensors are contiguous Iwan Kawrakow 2025-10-26 16:59:31 +02:00
  • 837c219b51 More gemv+add fusing Iwan Kawrakow 2025-10-26 09:32:02 +02:00
  • 5ddde01542 Fuse Q, K, V gemv+add Iwan Kawrakow 2025-10-25 17:30:45 +03:00
  • e34399c116 CUDA: fuse ffn_up*unary_op(ffn_gate) for MMVQ (V2) (#864) Kawrakow 2025-10-26 17:08:50 +02:00
  • f76e98536f CUDA: fuse ffn_up*unary_op(ffn_gate) for MMVQ (V2) (#864) Kawrakow 2025-10-26 17:08:50 +02:00
  • a5b16b82bb More gemv+add fusing ik/biased_qkv Iwan Kawrakow 2025-10-26 09:32:02 +02:00
  • 7da6fb0979 Fuse Q, K, V gemv+add Iwan Kawrakow 2025-10-25 17:30:45 +03:00
  • d0861f83f1 Fix TG fused up*nary(gate) when down cannot be fused ik/reorg_mmvq_and_fuse_bias Iwan Kawrakow 2025-10-26 08:38:25 +02:00
  • 8d01a40f5b Disable assert Iwan Kawrakow 2025-10-25 15:21:13 +03:00
  • 5bd9cd3490 Add disagnostics Iwan Kawrakow 2025-10-25 11:12:16 +03:00
  • a7d050426a Somehow I forgot to change the ggml_type in the legacy template calls Iwan Kawrakow 2025-10-25 11:09:37 +03:00
  • 5ff69380db Put iqk mmvq implementations into template instances Iwan Kawrakow 2025-10-25 08:18:02 +03:00
  • 79d1c4ebb9 Split mmvq.cu and iqk_mmvq.cu into separate template instances Iwan Kawrakow 2025-10-25 06:53:56 +03:00
  • 1fcae126cf Also iqk quants Iwan Kawrakow 2025-10-24 18:55:44 +03:00
  • 6b57074431 Fuse mul_mat_id and add_id into a single kernel for mmvq Iwan Kawrakow 2025-10-24 18:29:16 +03:00
  • 4a08ac7241 Fusing mmvq also in non-MoE up+gate Iwan Kawrakow 2025-10-24 16:02:23 +03:00
  • 196e73588c Fusing also for iqk/trellis/repacked quants Iwan Kawrakow 2025-10-24 15:05:18 +03:00
  • 3da71dcda2 Fused ffn_up*unary_op(ffn_gate) for MMVQ (with bias) Iwan Kawrakow 2025-10-24 13:07:54 +03:00
  • 73c551aa9e Fused ffn_up*unary_op(ffn_gate) for MMVQ (no bias) Iwan Kawrakow 2025-10-24 11:27:18 +03:00
  • b5cb6cd38e WIP Iwan Kawrakow 2025-10-24 10:34:43 +03:00
  • a46b5e337c Args for MMVQ functions Iwan Kawrakow 2025-10-24 09:06:21 +03:00
  • 41d6c42b96 Change flash attention and fmoe to be on by default (#863) Kawrakow 2025-10-25 09:37:28 +03:00
  • 16f30fcf31 Change flash attention and fmoe to be on by default (#863) Kawrakow 2025-10-25 09:37:28 +03:00
  • 6d05977940 Change flash attention to be on by default ik/change_fmoe_fa_defaults Iwan Kawrakow 2025-10-25 09:32:01 +03:00
  • 9dc0c89bc9 Change default fmoe also in llama-bench Iwan Kawrakow 2025-10-25 09:22:17 +03:00
  • 03a0f4d3cc Change fmoe to be on by default Iwan Kawrakow 2025-10-25 09:18:48 +03:00
  • 9e7d5ea64a Diagnostics ik/mmvq_args Iwan Kawrakow 2025-10-25 09:04:36 +03:00
  • 17dedc8cba Also iqk quants ik/mmvq_fuse_bias Iwan Kawrakow 2025-10-24 18:55:44 +03:00
  • efa858cb4a Fuse mul_mat_id and add_id into a single kernel for mmvq Iwan Kawrakow 2025-10-24 18:29:16 +03:00
  • 2fe22e3085 Fusing mmvq also in non-MoE up+gate Iwan Kawrakow 2025-10-24 16:02:23 +03:00
  • d0e8bb539a Fusing also for iqk/trellis/repacked quants Iwan Kawrakow 2025-10-24 15:05:18 +03:00
  • 8c7838a6c5 Fused ffn_up*unary_op(ffn_gate) for MMVQ (with bias) Iwan Kawrakow 2025-10-24 13:07:54 +03:00
  • 2986d3c21f Fused ffn_up*unary_op(ffn_gate) for MMVQ (no bias) Iwan Kawrakow 2025-10-24 11:27:18 +03:00
  • 590084b57f WIP Iwan Kawrakow 2025-10-24 10:34:43 +03:00
  • 8aff26a91f Args for MMVQ functions Iwan Kawrakow 2025-10-24 09:06:21 +03:00
  • 70c0095e11 Faster tensor name formatting (#860) Kawrakow 2025-10-24 07:46:18 +03:00
  • 2522c97dc9 Faster tensor name formatting (#860) Kawrakow 2025-10-24 07:46:18 +03:00
  • 1f96fc97c6 Faster tensor name formatting ik/format_name Iwan Kawrakow 2025-10-23 18:37:27 +03:00
  • 159051183b fused mul+multi_add: command line argument to disable it Iwan Kawrakow 2025-10-23 11:34:24 +03:00
  • c9b80b2665 Adding fused mul+multi_add + CPU implementation Iwan Kawrakow 2025-10-23 10:23:59 +03:00
  • 0549be76e5 Fused mul + multi_add op (#858) Kawrakow 2025-10-24 07:40:35 +03:00
  • db3ba4999f Fused mul + multi_add op (#858) Kawrakow 2025-10-24 07:40:35 +03:00
  • 2673f55808 fused mul+multi_add: command line argument to disable it ik/fused_mul_multiadd Iwan Kawrakow 2025-10-23 11:34:24 +03:00
  • 5b1efbe498 fused mul+multi_add: CUDA Iwan Kawrakow 2025-10-23 11:01:59 +03:00
  • 186c8d2975 Adding fused mul+multi_add + CPU implementation Iwan Kawrakow 2025-10-23 10:23:59 +03:00
  • 856c6da9c1 Fix experts mul node name (#857) Kawrakow 2025-10-23 09:46:01 +03:00
  • 483cea527d Fix experts mul node name (#857) Kawrakow 2025-10-23 09:46:01 +03:00
  • 637d1b014c Fix experts mul node name ik/fix_experts_node_name Iwan Kawrakow 2025-10-23 09:44:44 +03:00
  • ed4e1a6588 Fuse add+add+fused_rms (#853) Kawrakow 2025-10-22 16:18:11 +03:00
  • 0e1d33ca4a Fuse add+add+fused_rms (#853) Kawrakow 2025-10-22 16:18:11 +03:00
  • 3174233a9b Various: ik/fuse_add_add_fused_rms Iwan Kawrakow 2025-10-22 13:02:10 +03:00
  • 4bc2360f76 Macro to easily enable/disable fusion Iwan Kawrakow 2025-10-22 07:51:53 +03:00
  • c291fc056c Try this Iwan Kawrakow 2025-10-21 19:37:16 +03:00
  • d5cfbca86b Fuse add+add+fused_rms Iwan Kawrakow 2025-10-21 16:17:59 +03:00
  • af5bf60cc8 Hopefully this fixes #854 (#855) Kawrakow 2025-10-21 19:07:23 +03:00
  • 8aa3c2ec5e Hopefully this fixes #854 (#855) Kawrakow 2025-10-21 19:07:23 +03:00
  • 5e85b0ea51 Also this one ik/try_fix_854 Iwan Kawrakow 2025-10-21 19:06:04 +03:00
  • 278cf57f1f Hopefully this fixes #854 Iwan Kawrakow 2025-10-21 18:58:32 +03:00
  • 366d66bc1a Fuse add + fused_rms_norm (CUDA) (#852) Kawrakow 2025-10-21 14:29:50 +03:00
  • caf9759c97 Fuse add + fused_rms_norm (CUDA) (#852) Kawrakow 2025-10-21 14:29:50 +03:00
  • b81f7eb57a Combine add + fused_rms_norm ik/fuse_add_fused_rms Iwan Kawrakow 2025-10-21 08:48:26 +03:00
  • f1a8977da7 Combine all calls to llm_build_norm to a single line Iwan Kawrakow 2025-10-21 08:46:55 +03:00
  • a27d661aeb Fix fused grouped topk (#851) Kawrakow 2025-10-21 10:10:38 +03:00
  • 92231460cf Fix fused grouped topk (#851) Kawrakow 2025-10-21 10:10:38 +03:00