Commit Graph

  • 0383dfb177 CUDA: set current device in compute_forward (#1039) Kawrakow 2025-12-05 16:47:50 +01:00
  • e741ec8a5d CUDA: Fix FA for Pascal GPU (#1036) firecoperana 2025-12-05 09:42:14 -06:00
  • 42e4c61243 CUDA: Fix FA for Pascal GPU (#1036) firecoperana 2025-12-05 09:42:14 -06:00
  • b18f658a7d CUDA: set current device in compute_forward ik/cuda_set_device Kawrakow 2025-12-05 15:40:48 +00:00
  • f4def9b300 Don't split the output tensor (#1038) Kawrakow 2025-12-05 15:56:53 +01:00
  • 2125f68636 Don't split the output tensor (#1038) Kawrakow 2025-12-05 15:56:53 +01:00
  • ed8a3d8e3d Don't split the output tensor ik/dont_split_output Kawrakow 2025-12-05 13:16:11 +00:00
  • b43801a2d2 Fix debug build (#1037) Kawrakow 2025-12-05 14:06:22 +01:00
  • 9264abfbaf Fix debug build (#1037) ik/fix_debug_build Kawrakow 2025-12-05 14:06:22 +01:00
  • b715342e82 K-cache Hadamard transforms (CUDA) (#1034) Kawrakow 2025-12-04 18:46:22 +01:00
  • efc8c8ef8d K-cache Hadamard transforms (CUDA) (#1034) Kawrakow 2025-12-04 18:46:22 +01:00
  • c374b221b6 Mistral3-large ik/mistral3_large Iwan Kawrakow 2025-12-04 18:05:40 +02:00
  • 6387a5800a Minor ik/k_cache_hadamard_cuda Kawrakow 2025-12-04 05:47:48 +00:00
  • 180b3f0a40 Hadamard transforms for K-cache on CUDA Kawrakow 2025-12-03 19:14:05 +00:00
  • 658ced0abd Hadamard transforms for K-cache - CPU only (#1033) Kawrakow 2025-12-04 06:51:11 +01:00
  • 18fdd80eaf Hadamard transforms for K-cache - CPU only (#1033) Kawrakow 2025-12-04 06:51:11 +01:00
  • 9c17d5f176 WIP: Hadamard transforms for K-cache ik/k_cache_hadamard Kawrakow 2025-12-03 14:26:46 +00:00
  • 08961718f3 Allow empty splits (#1029) Kawrakow 2025-12-03 13:52:41 +01:00
  • 0581f90c0f Allow empty splits (#1029) Kawrakow 2025-12-03 13:52:41 +01:00
  • bcb218102d Use standard attention for Ministral3 (#1032) Kawrakow 2025-12-03 13:43:31 +01:00
  • 90f36eb517 Use standard attention for Ministral3 (#1032) Kawrakow 2025-12-03 13:43:31 +01:00
  • ab19054a79 Use standard attention for Ministral3 ik/mistral3_std_attn Kawrakow 2025-12-03 10:51:32 +00:00
  • 74c56067b4 Fix bug in ggml_cuda_op_scale_tensor (#1031) Kawrakow 2025-12-03 11:32:19 +01:00
  • 7fbe8d3ac2 Fix bug in ggml_cuda_op_scale_tensor (#1031) Kawrakow 2025-12-03 11:32:19 +01:00
  • c5f9a5c29a Fix bug in ggml_cuda_op_scale_tensor ik/fix_cuda_scale_bug Kawrakow 2025-12-03 10:28:26 +00:00
  • fcc2df11df Adding ministral3: this seems to work (#1030) Kawrakow 2025-12-03 11:01:21 +01:00
  • cf20d0c756 Adding ministral3: this seems to work (#1030) Kawrakow 2025-12-03 11:01:21 +01:00
  • 84129f7eb6 Adding ministral3: this seems to work ik/ministral3 Iwan Kawrakow 2025-12-03 11:41:44 +02:00
  • dde8028336 WIP: allocate graph ik/graph_alloc Kawrakow 2025-12-03 07:54:53 +00:00
  • b415e734e5 Fix also output ik/allow_empty_splits Kawrakow 2025-12-03 04:53:44 +00:00
  • de1614e753 Fix type, add additional asserts Kawrakow 2025-12-03 04:44:23 +00:00
  • 452c4e14d7 Allow empty splits Kawrakow 2025-12-02 17:52:28 +00:00
  • 40097e7e41 Slightly better graph split strategy (#1026) Kawrakow 2025-12-02 18:50:52 +01:00
  • 92410bbd1e Slightly better graph split strategy (#1026) Kawrakow 2025-12-02 18:50:52 +01:00
  • 49ec5726d7 Is this better for multi-GPU and split mode "graph"? ik/is_this_better_for_multi_gpu Kawrakow 2025-12-02 08:44:46 +00:00
  • c4c266847f Slightly better graph split strategy ik/slightly_better_graph_split_strategy Kawrakow 2025-12-02 08:18:55 +00:00
  • 8e3041b263 POC: CUDA tensor parallel (MoE models) (#1022) Kawrakow 2025-12-01 19:25:40 +01:00
  • a719349982 POC: CUDA tensor parallel (MoE models) (#1022) Kawrakow 2025-12-01 19:25:40 +01:00
  • 864b496831 Try to better distribute the splits ik/poc_tp_glm4.5 Kawrakow 2025-12-01 13:18:56 +00:00
  • c51968b6d8 Split mode graph for qwen3moe Kawrakow 2025-12-01 11:56:05 +00:00
  • 63d0389e18 WIP split mode attn Kawrakow 2025-12-01 09:34:14 +00:00
  • a8cb1860b3 Guards against using merge_qkv with split mode "graph" Kawrakow 2025-12-01 07:11:54 +00:00
  • ee0f02dcb0 Guarad against using split mode "graph" for unsupported models Kawrakow 2025-12-01 06:39:17 +00:00
  • a27904877a Minor Kawrakow 2025-11-30 16:56:47 +00:00
  • eb9882407f Better Kawrakow 2025-11-30 16:12:20 +00:00
  • 4fe175b555 Row-interleaved quants work Kawrakow 2025-11-30 08:02:48 +00:00
  • bbb1b1da6c Slightly better Kawrakow 2025-11-30 07:51:34 +00:00
  • c37c1bdc33 Slightly better Kawrakow 2025-11-30 06:50:03 +00:00
  • bfbfac0f1b This works but is slow Kawrakow 2025-11-30 06:38:18 +00:00
  • 072020a678 WIP tensor overrides Kawrakow 2025-11-29 17:37:27 +00:00
  • 663a9ccbbf Remove more split mode row remnants Kawrakow 2025-11-29 14:00:58 +00:00
  • bf2a1dad98 Make graph reuse work with split mode graph Kawrakow 2025-11-29 09:17:07 +00:00
  • abc5bd6e74 Work around compiler bug Kawrakow 2025-11-29 07:08:50 +00:00
  • 9e1d14f9c3 WIP GLM4.5 - this works Kawrakow 2025-11-28 15:05:01 +00:00
  • 43f644e482 WIP GLM4.5 - runs with wrong results Kawrakow 2025-11-28 14:09:24 +00:00
  • f218e16e17 Leave FFN partial results as f16 Kawrakow 2025-11-28 07:25:20 +00:00
  • ceaa71bcc8 Rename split mode "row" to split mode "graph" Kawrakow 2025-11-27 15:11:33 +00:00
  • d8d9c7bdca This results in faster PP. Kawrakow 2025-11-27 14:46:19 +00:00
  • e7d897e26f Allow for f16 source in fused_rms_norm Kawrakow 2025-11-27 14:45:35 +00:00
  • 094b51d5ae Make it work with partial offload Kawrakow 2025-11-27 07:55:21 +00:00
  • 5305590f04 Show memory used per device Kawrakow 2025-11-27 06:35:26 +00:00
  • 7d84dca29e Fix attn split Kawrakow 2025-11-27 05:03:52 +00:00
  • 52a7cbe482 Playing games with the scheduler Kawrakow 2025-11-26 20:34:37 +00:00
  • 4bcfb40711 This is slightly better Kawrakow 2025-11-26 17:18:28 +00:00
  • 97143330a1 This works, but it is slow Kawrakow 2025-11-26 15:50:46 +00:00
  • 4303587f1c WIP Kawrakow 2025-11-26 12:11:27 +00:00
  • 5d68e4eb35 WIP: it runs with wrong result Kawrakow 2025-11-26 09:27:12 +00:00
  • bc4be331ee WIP: also allocate the KV cache using tensor split Kawrakow 2025-11-25 15:30:37 +00:00
  • 32c6df015b WIP Kawrakow 2025-11-25 14:51:33 +00:00
  • 5ea430aaa4 Remove most of split mode row Kawrakow 2025-11-24 09:45:34 +02:00
  • 507f3a4d14 Fix build with RPC not enabled (#1025) Kawrakow 2025-11-30 19:04:54 +01:00
  • 02b717c8c6 Fix build with RPC not enabled (#1025) Kawrakow 2025-11-30 19:04:54 +01:00
  • 598e8e7d5f Fix build with RPC not enabled ik/fix_rpc_off2 Iwan Kawrakow 2025-11-30 20:01:51 +02:00
  • 15771072c7 RPC: support multiple devices including cpu (#1024) firecoperana 2025-11-30 11:48:02 -06:00
  • e89064e657 RPC: support multiple devices including cpu (#1024) firecoperana 2025-11-30 11:48:02 -06:00
  • 1cad1ec1cc Update grammar (#1023) firecoperana 2025-11-30 11:45:38 -06:00
  • 52adcf1e90 Update grammar (#1023) firecoperana 2025-11-30 11:45:38 -06:00
  • 869557c8fd Update mtmd to improve accuracy of M-RoPE (#993) firecoperana 2025-11-29 00:27:15 -06:00
  • 0a3e1d1449 Update mtmd to improve accuracy of M-RoPE (#993) firecoperana 2025-11-29 00:27:15 -06:00
  • d24ea9e48e server : add Anthropic Messages API support (#1012) hksdpc255 2025-11-29 17:24:59 +11:00
  • e7ecdb8f0d server : add Anthropic Messages API support (#1012) hksdpc255 2025-11-29 17:24:59 +11:00
  • ec45020e37 Leave FFN partial results as f16 ik/poc_tp Kawrakow 2025-11-28 07:25:20 +00:00
  • 259718f8cb Rename split mode "row" to split mode "graph" Kawrakow 2025-11-27 15:11:33 +00:00
  • 0b76f23334 This results in faster PP. Kawrakow 2025-11-27 14:46:19 +00:00
  • ed67bcbb2a Allow for f16 source in fused_rms_norm Kawrakow 2025-11-27 14:45:35 +00:00
  • fbbac10872 Make it work with partial offload Kawrakow 2025-11-27 07:55:21 +00:00
  • 59ab4ec151 Show memory used per device Kawrakow 2025-11-27 06:35:26 +00:00
  • 0e00db8acb Fix attn split Kawrakow 2025-11-27 05:03:52 +00:00
  • c9e129b3db Playing games with the scheduler Kawrakow 2025-11-26 20:34:37 +00:00
  • 929f328c77 This is slightly better Kawrakow 2025-11-26 17:18:28 +00:00
  • 7376d1c6eb This works, but it is slow Kawrakow 2025-11-26 15:50:46 +00:00
  • b6f00ada65 WIP Kawrakow 2025-11-26 12:11:27 +00:00
  • b703d00edc WIP: it runs with wrong result Kawrakow 2025-11-26 09:27:12 +00:00
  • 93cdd71673 WIP: also allocate the KV cache using tensor split Kawrakow 2025-11-25 15:30:37 +00:00
  • 135fc5f4c1 WIP Kawrakow 2025-11-25 14:51:33 +00:00
  • df8704ca78 Remove most of split mode row Kawrakow 2025-11-24 09:45:34 +02:00
  • bcdd3031d8 Attempt to fix #1014 (#1017) Kawrakow 2025-11-27 15:58:18 +01:00
  • d6daee337c Attempt to fix #1014 (#1017) Kawrakow 2025-11-27 15:58:18 +01:00
  • 4c4c84ba7f Attempt to fix #1014 ik/try_fix_1014 Iwan Kawrakow 2025-11-27 11:11:23 +02:00
  • e60f71887b Fix llama-bench mla parameter (#1016) Kawrakow 2025-11-27 09:33:30 +01:00