Commit Graph

  • 9932e6b102 Fix gemma3 vision (#803) Kawrakow 2025-09-27 11:15:32 +02:00
  • 3d4977cb6e Fix gemma3 vision (#803) Kawrakow 2025-09-27 11:15:32 +02:00
  • a438096765 Remove unnecessary assert in im2col (CPU) ik/fix_gemma3_vision Iwan Kawrakow 2025-09-27 12:11:33 +03:00
  • f4360e3aca Remove unnecessary assert in im2col Iwan Kawrakow 2025-09-27 12:07:47 +03:00
  • e2f21c8dc8 Move minja and nlohmann/json to vendor (#802) Kawrakow 2025-09-27 09:12:35 +02:00
  • 95780cddc9 Move minja and nlohmann/json to vendor (#802) Kawrakow 2025-09-27 09:12:35 +02:00
  • 54375a5587 Move minja and nlohmann/json to vendor ik/vendor Iwan Kawrakow 2025-09-27 10:11:03 +03:00
  • 346f580267 Remove stb_image.h copy in common - it is now in vendor (#801) Kawrakow 2025-09-27 08:55:42 +02:00
  • 5064ff8a54 Remove stb_image.h copy in common - it is now in vendor (#801) Kawrakow 2025-09-27 08:55:42 +02:00
  • 40c05459c3 Remove stb_image.h copy in common - it is now in vendor ik/dedup_stb_image Iwan Kawrakow 2025-09-27 09:52:58 +03:00
  • c1a0e15377 Port mdmd from mainline + Qwen2/2.5-VL support (#798) Kawrakow 2025-09-27 08:45:29 +02:00
  • 87e4762720 Port mdmd from mainline + Qwen2/2.5-VL support (#798) Kawrakow 2025-09-27 08:45:29 +02:00
  • 7d8d232896 sync: vendor (#799) firecoperana 2025-09-26 11:22:47 -05:00
  • 367654f99e sync: vendor (#799) firecoperana 2025-09-26 11:22:47 -05:00
  • be7eb79d44 Add mtmd: this fixes gibberish on second image ik/add_mtmd Iwan Kawrakow 2025-09-26 18:25:47 +03:00
  • f62995283d LOG stuff Iwan Kawrakow 2025-09-26 12:52:52 +03:00
  • e3e572fe6f Fix new/free mismatch Iwan Kawrakow 2025-09-26 12:52:25 +03:00
  • 09b3381976 Fix typo Iwan Kawrakow 2025-09-26 12:52:04 +03:00
  • 7e6a1fd912 GLU, not GPU Iwan Kawrakow 2025-09-26 11:20:05 +03:00
  • dbcc01bc43 Add mtmd: do not attempt to load a GPU backend if none are available Iwan Kawrakow 2025-09-26 09:08:48 +03:00
  • 042f595632 Add mtmd: use LOG_TEE so generated tokens show up in terminal Iwan Kawrakow 2025-09-26 08:46:42 +03:00
  • e7ddefc9c5 Add mtmd: fix swiglu Iwan Kawrakow 2025-09-26 08:39:18 +03:00
  • 933b99cbd8 Add mtmd: Qwen2.5-VL text seems to work with this change Iwan Kawrakow 2025-09-26 06:32:55 +03:00
  • f2a094d92d Add mtmd: add Qwen2-VL Iwan Kawrakow 2025-09-25 20:09:46 +03:00
  • ae9ac97246 Add mtmd: refresh CUDA rope Iwan Kawrakow 2025-09-25 18:57:35 +03:00
  • 293a59d4f1 Add mtmd: refresh CPU rope Iwan Kawrakow 2025-09-25 18:49:25 +03:00
  • 879201c26d Add CUDA implementation for GGML_OP_CONV_2D and GGML_OP_CONV_2D_DW Iwan Kawrakow 2025-09-25 18:29:52 +03:00
  • 8732eebc94 Add CPU implementation for GGML_OP_CONV_2D and GGML_OP_CONV_2D_DW Iwan Kawrakow 2025-09-25 18:20:27 +03:00
  • 292c934ac8 Add CUDA implementation for GGML_OP_GLU Iwan Kawrakow 2025-09-25 17:22:58 +03:00
  • c86dadd09f Add CPU implementation for GGML_OP_GLU Iwan Kawrakow 2025-09-25 16:23:46 +03:00
  • 24618e301b Add mtmd: builds successfully Iwan Kawrakow 2025-09-25 15:40:46 +03:00
  • 6b0c8e02a8 Add mtmd: clip.cpp compiles Iwan Kawrakow 2025-09-25 13:23:13 +03:00
  • 59133173fa Add mtmd: clip initialization compiles Iwan Kawrakow 2025-09-25 11:22:19 +03:00
  • 31a9ddb658 Add mtmd: mtmd.cpp compiles Iwan Kawrakow 2025-09-25 09:37:35 +03:00
  • 7829a6024a Add mtmd: the beginning Iwan Kawrakow 2025-09-25 08:51:06 +03:00
  • bc34573356 CPU: faster FA (#797) Kawrakow 2025-09-26 09:00:25 +02:00
  • c108e4b7c9 CPU: faster FA (#797) Kawrakow 2025-09-26 09:00:25 +02:00
  • 4f9b0ec4f0 Fix logprobs (#787) Yap Sok Ann 2025-09-25 20:43:30 +07:00
  • 6bb76b142d Fix logprobs (#787) Yap Sok Ann 2025-09-25 20:43:30 +07:00
  • 11621fe433 Avoid computing FA chunks where the mask is -infinity also for f16/bf16 ik/better_fa_masking Iwan Kawrakow 2025-09-24 17:23:40 +03:00
  • 5a4dfb5aa1 Avoid computing FA chunks where the mask is -infinity Iwan Kawrakow 2025-09-24 16:55:25 +03:00
  • f8b66238fa Fused matrix multiplications (CUDA and CPU) (#796) Kawrakow 2025-09-24 16:52:54 +02:00
  • 8e497e704e Fused matrix multiplications (CUDA and CPU) (#796) Kawrakow 2025-09-24 16:52:54 +02:00
  • 9c6988f61c Fix dequantization when requantizing (#795) Kawrakow 2025-09-24 12:44:30 +02:00
  • 0d1bbde1c4 Fix dequantization when requantizing (#795) Kawrakow 2025-09-24 12:44:30 +02:00
  • 08080356ab Fix dequantization when requantizing ik/fix_dequantize_when_requantizing Iwan Kawrakow 2025-09-24 11:44:33 +03:00
  • 15dfadccae Revert timing on committed by mistake ik/fuse_qkv Iwan Kawrakow 2025-09-24 11:00:37 +03:00
  • fc719f0a4e This is not needed Iwan Kawrakow 2025-09-24 10:59:51 +03:00
  • 44559ba4ee Use llm_build_mul_mat_qkv Iwan Kawrakow 2025-08-31 09:42:02 +03:00
  • ff4f403231 Doesn't do much on the GPU either Iwan Kawrakow 2025-08-31 08:48:51 +03:00
  • cef57a6b13 Quick attempt to fuse the Q, K, V GEMMs Iwan Kawrakow 2025-08-30 13:48:50 +03:00
  • f59b2909d4 cpu: fused softmax+topk (#794) Kawrakow 2025-09-24 09:02:21 +02:00
  • cde2eb5e95 cpu: fused softmax+topk (#794) Kawrakow 2025-09-24 09:02:21 +02:00
  • 17f7f1ed18 Update webui to handle reasoning content and include usage stats in server only when requested (#791) firecoperana 2025-09-24 00:45:09 -05:00
  • 09db3a494f Update webui to handle reasoning content and include usage stats in server only when requested (#791) firecoperana 2025-09-24 00:45:09 -05:00
  • 08d116cd02 Cleanup ik/cpu_topk_moe Iwan Kawrakow 2025-09-24 08:30:26 +03:00
  • f649c55ef3 cpu: fused softmax+topk Iwan Kawrakow 2025-09-23 20:27:22 +03:00
  • 8b4208e789 Fix #772 (#790) Kawrakow 2025-09-23 16:43:02 +02:00
  • 45afaf3391 Fix #772 (#790) Kawrakow 2025-09-23 16:43:02 +02:00
  • 079231c291 model : add grok-2 support (#782) firecoperana 2025-09-23 09:31:01 -05:00
  • 8cd2d7ccd7 model : add grok-2 support (#782) firecoperana 2025-09-23 09:31:01 -05:00
  • 0a70ca0bc0 Fix #772 ik/try_fix_772 Iwan Kawrakow 2025-09-23 17:25:47 +03:00
  • 4591e83825 cuda: fused top_k+softmax as used in most MoE models (#789) Kawrakow 2025-09-23 13:45:57 +02:00
  • 18f04350e9 cuda: fused top_k+softmax as used in most MoE models (#789) Kawrakow 2025-09-23 13:45:57 +02:00
  • 3132dd368f cuda: fused top_k+softmax as used in most MoE models ik/cuda_topk_moe Iwan Kawrakow 2025-09-23 13:59:44 +03:00
  • af5f2859c2 Fix compiler warnings (#788) Kawrakow 2025-09-23 10:30:15 +02:00
  • 64e357c327 Fix compiler warnings (#788) Kawrakow 2025-09-23 10:30:15 +02:00
  • 8794a2fecd Fix compiler warnings ik/fix_compiler_warnings Iwan Kawrakow 2025-09-23 11:28:52 +03:00
  • a6da22beb2 Deepseek V3.1 native tool calling support (OpenAI Style) (#771) firecoperana 2025-09-13 00:51:40 -05:00
  • 6d2e7ca42d Deepseek V3.1 native tool calling support (OpenAI Style) (#771) firecoperana 2025-09-13 00:51:40 -05:00
  • de97c33b40 fix convert error for ernie 4.5 (#774) firecoperana 2025-09-11 00:59:24 -05:00
  • 0ce2f93581 fix convert error for ernie 4.5 (#774) firecoperana 2025-09-11 00:59:24 -05:00
  • 8403308d8e fix v1 completions streaming mode (#768) firecoperana 2025-09-09 08:38:12 -05:00
  • d323871ba9 fix v1 completions streaming mode (#768) firecoperana 2025-09-09 08:38:12 -05:00
  • 540a26514f This is very slightly better (#762) Kawrakow 2025-09-05 21:31:02 +02:00
  • c519d4177b This is very slightly better (#762) Kawrakow 2025-09-05 21:31:02 +02:00
  • 3a6ebc7764 This is very slightly better ik/ooae2 Iwan Kawrakow 2025-09-05 10:25:41 +03:00
  • f74dd77143 Fix ggml_is_contiguously_allocated (#764) Kawrakow 2025-09-05 19:05:02 +02:00
  • c15f8ac508 Fix ggml_is_contiguously_allocated (#764) Kawrakow 2025-09-05 19:05:02 +02:00
  • 4b66e9234c Fix ggml_is_contiguously_allocated ik/fix_contiguously_allocated Iwan Kawrakow 2025-09-05 20:01:27 +03:00
  • 426032c27a Add Ernie 4.5 MOE and 0.3B Support (#759) firecoperana 2025-09-05 04:54:35 -05:00
  • 33e071201f Add Ernie 4.5 MOE and 0.3B Support (#759) firecoperana 2025-09-05 04:54:35 -05:00
  • 49979ba9e9 llama: enable K-shift for quantized KV cache for cuda (#760) firecoperana 2025-09-05 04:54:18 -05:00
  • cec8b70a7e llama: enable K-shift for quantized KV cache for cuda (#760) firecoperana 2025-09-05 04:54:18 -05:00
  • 13c3b6412e Offload only activated experts to the GPU (#698) Kawrakow 2025-09-04 12:22:30 +02:00
  • 0c15494c30 Offload only activated experts to the GPU (#698) Kawrakow 2025-09-04 12:22:30 +02:00
  • 144d456717 Better CPU SWA (#757) Kawrakow 2025-09-04 11:58:16 +02:00
  • 06cc7c6894 Better CPU SWA (#757) Kawrakow 2025-09-04 11:58:16 +02:00
  • 910a27ab9b Better CPU SWA ik/cpu_swa_v2 Iwan Kawrakow 2025-09-04 11:08:42 +03:00
  • 4a6a6f17ee Alternative CUDA FA for SWA models (#754) Kawrakow 2025-09-04 08:42:18 +02:00
  • f5e68bf8b6 Alternative CUDA FA for SWA models (#754) Kawrakow 2025-09-04 08:42:18 +02:00
  • b02e137f60 This is slightly better ik/cuda_swa3 Iwan Kawrakow 2025-09-04 09:06:31 +03:00
  • bf0b5088e0 f32 vec kernel Iwan Kawrakow 2025-09-03 14:52:32 +03:00
  • 2a09cd1c08 Need also this Iwan Kawrakow 2025-09-03 13:58:11 +03:00
  • d858286847 Using vec kernel when we have SWA Iwan Kawrakow 2025-09-03 13:45:36 +03:00
  • 3c43f9dc7d Add a command line argument ik/sched_copy_experts Iwan Kawrakow 2025-09-02 18:58:34 +03:00
  • 9ef79d3073 Log out of bounds access details Iwan Kawrakow 2025-08-16 17:48:52 +03:00
  • 15911fa35c Do not recalculate activated expers for fused up/gate Iwan Kawrakow 2025-08-16 14:05:15 +03:00
  • 5951266711 This seems to do the trick for -fmoe Iwan Kawrakow 2025-08-16 13:54:31 +03:00
  • 160fe837dc Offload only activated experts Iwan Kawrakow 2025-08-16 13:06:17 +03:00