Commit Graph

  • 00ba208a5c Fix Gemma4 partial offload (#1657) main Kawrakow 2026-04-19 14:25:05 +02:00
  • d6657db245 Fix NaNs in Q4_K/Q5_K quantized MiniMax-2.7 models on CUDA (#1659) Kawrakow 2026-04-19 14:24:51 +02:00
  • 97369ccd1c Fix NaNs in Q4_K/Q5_K quantized MiniMax-2.7 models on CUDA ik/fix_cuda_nans Kawrakow 2026-04-19 11:06:58 +00:00
  • 8c54fc656d fix fcp/auto_parser2 firecoperana 2026-04-17 20:33:25 -05:00
  • 5f87445aa7 Autoparser - complete refactoring of parser architecture Piotr Wilkin (ilintar) 2026-03-10 17:54:23 -05:00
  • eb76fa5d0b Also here ik/fix_gemma4_hybrid Kawrakow 2026-04-18 20:37:12 +03:00
  • 1a59adb37d Fix Gemma4 partial offload Kawrakow 2026-04-18 20:19:55 +03:00
  • 8befd92ea5 Refactor model compute graphs (#1651) Kawrakow 2026-04-18 17:08:43 +02:00
  • 260622faf6 Self-decoding: Adds support for suffix decoding (#1646) Samuel Oliveira Alves 2026-04-18 11:10:10 -03:00
  • 52efa12fda fix: add missing __syncthreads in delta net CUDA kernel (#1649) markaalonzo 2026-04-17 15:45:46 -04:00
  • ebfed8f3fd Remove unused function ik/refactor_graphs Kawrakow 2026-04-17 09:14:39 +00:00
  • 7a153eabf8 Refactor model compute graphs Kawrakow 2026-04-17 09:11:08 +00:00
  • 64234e3c4e Fix compiler warning Kawrakow 2026-04-17 06:04:43 +00:00
  • 7b6507ddac server: fix usage stats (#1647) firecoperana 2026-04-17 00:27:47 -05:00
  • a42f898d35 fix: use int8_t for GGUF bool array loading instead of platform-dependent bool (#1648) markaalonzo 2026-04-17 01:25:07 -04:00
  • d0f1e043b9 server: fix usage stats fcp/fix_usage firecoperana 2026-04-16 20:19:04 -05:00
  • eaf83865a1 Vision support for Gemma4 (#1635) Kawrakow 2026-04-16 17:26:31 +02:00
  • 539d1cf989 Disallow speculation for hybrid/recurrent models (#1645) Kawrakow 2026-04-16 17:21:44 +02:00
  • 01a3b4d134 Disallow speculation for hybrid/recurrent models ik/disallow_speculation_for_hybrid Kawrakow 2026-04-16 13:47:39 +00:00
  • 98a2558025 Minor ik/try_minimax_better_sm_graph Kawrakow 2026-04-16 13:09:28 +00:00
  • 894bf90cbc Minor Kawrakow 2026-04-16 12:41:38 +00:00
  • e4d3188ee8 This works Kawrakow 2026-04-16 12:13:22 +00:00
  • 8df5cbc0b3 Fix Build and Push Container Image (#1633) Yadir Hernandez Batista 2026-04-16 05:24:19 -04:00
  • 42e17667c4 WIP Kawrakow 2026-04-16 09:15:41 +00:00
  • f095f9eaa9 WIP: Better graph parallel for MiniMax Kawrakow 2026-04-16 08:11:20 +00:00
  • 4945d3b7d0 Update AUTHORS Kawrakow 2026-04-16 08:55:50 +02:00
  • 4f4bcfbe67 Add --defer-experts flag to defer expert mmap residency on Linux (#1634) dmaivel 2026-04-16 02:54:44 -04:00
  • 0b81212dea CPU: allow all supported quantization types for FlashMLA (#1641) Kawrakow 2026-04-16 08:37:20 +02:00
  • b921ac3eb2 CPU: allow all supported quantization types for FlashMLA ik/cpu_mla_all_quants Kawrakow 2026-04-16 06:27:57 +00:00
  • 470d3a3b5b Add support for parallel graphs to GLM MTP (#1637) Samuel Oliveira Alves 2026-04-16 03:05:34 -03:00
  • 1163af96cf cuda: cap host MMQ tile size on Volta to match device kernels (#1638) Horacio Vico 2026-04-15 09:08:41 -03:00
  • 2b30f765b3 This seems to work ik/gemma4_vision Kawrakow 2026-04-14 12:10:59 +00:00
  • 21a00f3679 Still not working Kawrakow 2026-04-10 09:00:28 +00:00
  • 6bb40a41e1 GLU was not advertised as supported on CUDA Kawrakow 2026-04-10 07:07:20 +00:00
  • aac1cb69e8 Remove unnecessary assert in CUDA rms_norm Kawrakow 2026-04-10 06:56:08 +00:00
  • c51234d1fc WIP: Gemma4 vision Kawrakow 2026-04-10 06:49:24 +00:00
  • 55d3c05bf7 Fused fused_rms_norm + fused_rms_norm + add (#1627) Kawrakow 2026-04-13 13:24:39 +02:00
  • 2469015f10 Cleanup ik/fuse_rms_rms_add Kawrakow 2026-04-13 05:41:08 +00:00
  • 5e395f0ed9 Dedicated fused_rms_norm + fused_rms_norm + add op Kawrakow 2026-04-12 16:44:39 +00:00
  • 75c658e84a Fuse fused_rms + fused_rms + add Kawrakow 2026-04-12 05:21:29 +00:00
  • 191b53c2cd Fix MiniMax V-cache Hadamard with split mode graph (#1625) Kawrakow 2026-04-13 07:27:57 +02:00
  • 45eda9decf Fix typo ik/fix_minimax_hadamard Kawrakow 2026-04-13 05:24:41 +00:00
  • 67268c8fde Fix mixed KV cache: type_v_first used instead of type_v_last for last layers (#1626) Nexes the Elder 2026-04-13 07:23:13 +02:00
  • 763b34c840 Add ffn_up_gate_exps argument to MiniMax llm_build_std_moe_ffn call Kawrakow 2026-04-12 17:03:11 +00:00
  • 1f4ef503a4 Fix MiniMax V-cache Hadamard Kawrakow 2026-04-12 16:48:11 +00:00
  • 6b6f46bc3c Add reuse property to ggml_cgraph ik/graph_reuse_field Kawrakow 2026-04-11 13:49:27 +00:00
  • 08ae48c667 Better routing for Gemma4-MoE (#1615) Kawrakow 2026-04-11 15:19:02 +02:00
  • 7c7cfa4b19 Better routing for Gemma4-MoE ik/gemma4_routing Kawrakow 2026-04-11 13:11:54 +00:00
  • b0750b5d43 Fuse some ops for Gemma4-MoE (#1610) Kawrakow 2026-04-11 08:11:54 +02:00
  • 2c455ec468 Change container build action to manual dispatch Kawrakow 2026-04-11 06:10:08 +00:00
  • 869b83bc49 Add Unicode allowlist (#1597) dungquixote42 2026-04-10 12:22:57 -04:00
  • 5720a4131a Update docs (#1606) mcm007 2026-04-10 19:20:28 +03:00
  • 6bcee079f6 Update AUTHORS Kawrakow 2026-04-10 18:16:41 +02:00
  • c3f16b80fe Fuse some ops for Gemma4-MoE ik/gemma4_fuse_logits Kawrakow 2026-04-10 16:10:45 +00:00
  • 7f4d106d25 Fix for build-container (#1609) Yadir Hernandez Batista 2026-04-10 12:03:10 -04:00
  • db31e7d803 Added workflow to build container images (#1279) Yadir Hernandez Batista 2026-04-10 02:06:47 -04:00
  • 13d7178db9 Fix Gemma4-MoE graph parallel (#1604) Kawrakow 2026-04-09 17:31:09 +02:00
  • 263ff26019 Fix Gemma4-MoE graph parallel ik/gemma4_gp_bugfix Kawrakow 2026-04-09 15:24:32 +00:00
  • 557b674f63 Add llama_context to MTP (#1601) Samuel Oliveira Alves 2026-04-09 10:33:56 -03:00
  • 9b5785ad6b Gemma4 tokenizer fixes (#1603) Kawrakow 2026-04-09 15:33:28 +02:00
  • ff6c8133ad Gemma4 tokenizer fixes ik/gemma4_tokenizer_fixes Kawrakow 2026-04-09 13:20:32 +00:00
  • 847e191936 Graph parallel for Gemma4 MoE (#1600) Kawrakow 2026-04-09 14:07:29 +02:00
  • c72ec40c08 Consolidate MoE and dense graph parallel ik/sm_graph_gemma4_moe Kawrakow 2026-04-09 09:45:06 +00:00
  • 04518f4733 Disable SWA optimization Kawrakow 2026-04-09 09:43:06 +00:00
  • 9db5d9907e Mixed KV cache (#1599) Kawrakow 2026-04-09 09:33:17 +02:00
  • 7062be1a45 Merge remote-tracking branch 'origin/main' into ik/sm_graph_gemma4_moe Kawrakow 2026-04-09 06:05:40 +00:00
  • 5950d0259e Graph parallel for Gemma4-31B (#1596) Kawrakow 2026-04-09 08:00:22 +02:00
  • 6f7a9f25a8 Split mode graph for Gemma4-MoE - this works Kawrakow 2026-04-08 14:54:43 +00:00
  • 7d9ed9c4bd WIP: split mode graph for Gemma4-MoE - crashes Kawrakow 2026-04-08 13:28:35 +00:00
  • 90de8e31db Fix crash when saving/loading KV cache ik/standardize_gemma4 Kawrakow 2026-04-08 11:36:24 +00:00
  • 8fd97c4b6b Put attn_norm, attn_post_norm, ffn_norm, ffn_post_norm on all GPUs Kawrakow 2026-04-08 06:27:34 +00:00
  • e6510b661b This works! Kawrakow 2026-04-07 17:23:44 +00:00
  • 273a0bbec7 WIP Kawrakow 2026-04-07 15:16:22 +00:00
  • 7fe8ff80af WIP: Gemma4 split mode graph Kawrakow 2026-04-07 13:46:39 +00:00
  • ffa2dd4a10 WIP: Gemma4 split mode graph Kawrakow 2026-04-07 13:08:26 +00:00
  • 83fa838ede Standardize Gemma4 dense ffn Kawrakow 2026-04-06 15:13:33 +00:00
  • f3c4e90acf Use build_std_attention for Gemma4 when possible Kawrakow 2026-04-06 14:56:27 +00:00
  • 1bbc5f916d Mixed KV cache ik/mixd_kv_cache Kawrakow 2026-04-08 10:00:10 +00:00
  • fac404509c Enable Hadamard tranform for head size of 512 (#1598) Kawrakow 2026-04-08 12:04:38 +02:00
  • f8e6988224 Enable Hadamard tranform for head size of 512 ik/hadamard_512 Kawrakow 2026-04-08 10:01:42 +00:00
  • 3de81530c5 Allow tuning of the best args for speculative decoding. (#1595) Samuel Oliveira Alves 2026-04-08 03:02:42 -03:00
  • 0a6e4335f7 Little maintenance (#1579) Nexes the Elder 2026-04-08 07:58:49 +02:00
  • 67fc9c5eb9 Fix Gemma4 quantized KV cache CPU FA performance (#1590) Kawrakow 2026-04-06 19:36:55 +02:00
  • a22778b984 Fix Gemma4 quantized KV cache on CUDA (#1592) Kawrakow 2026-04-06 19:35:38 +02:00
  • 7dc03af569 Fix Gemma4 quantized KV cache on CUDA ik/fix_gemma4_quantized_KV_cache_cuda Kawrakow 2026-04-06 17:28:25 +00:00
  • 2e976ebd69 Fix Gemma4 quantized KV cache CPU FA performance ik/fix_gemma4_quantized_kv_cache_cpu Kawrakow 2026-04-06 12:05:20 +00:00
  • 86e33fd6f4 Initial Gemma4 support (#1581) Kawrakow 2026-04-06 10:01:08 +02:00
  • 6d4cdef511 Optimize mul_mat_q8_1_r8_q8_2 with AVX-512 for faster Q4_K/Q5_K prompt processing (#1578) Adam Caldwell 2026-04-06 00:07:23 -07:00
  • 2cdff066f3 gemma4: tokenizer fixes ik/gemma4 Kawrakow 2026-04-05 16:52:53 +00:00
  • 5e8bb724ce server: support slot save/restore/erase for mtmd tokens and checkpoints (#1584) firecoperana 2026-04-05 01:41:04 -05:00
  • 5e2e5ac7d0 Gemma4: Q4B/E2B appear to work now Kawrakow 2026-04-05 06:08:11 +00:00
  • 88e544ea42 server: support slot save/restore/erase for mtmd tokens and checkpoints fcp/slot_save_json firecoperana 2026-04-03 10:26:06 -05:00
  • 6aaabe46c0 Gemma4: WIP E4B/E2B Kawrakow 2026-04-04 19:20:35 +00:00
  • 0147cf4837 Add additional explanations to the pinned memory log (#1582) Kawrakow 2026-04-04 08:53:58 +02:00
  • 7961ac9a51 Add additional explanations to the pinned memory log ik/pinned_suggest Kawrakow 2026-04-04 06:49:14 +00:00
  • fd71191b2a Update README.md Kawrakow 2026-04-04 08:32:37 +02:00
  • 584df79e52 Remove log Kawrakow 2026-04-04 06:03:51 +00:00
  • 1aed5a9df6 Gemma4: this works Kawrakow 2026-04-04 05:16:06 +00:00
  • 5000b017c3 Gemma4: WIP Kawrakow 2026-04-03 14:43:34 +00:00
  • 9042b9125a Gemma4: WIP - add CPU 512, 512 FA Kawrakow 2026-04-03 14:31:58 +00:00