Commit Graph

  • e28148d401 WIP ik/nccl1 Kawrakow 2025-12-20 06:50:58 +00:00
  • 5604ec9eae WIP: Runs with wrong results Kawrakow 2025-12-19 18:06:49 +00:00
  • d44b2fa4ab WIP: one needs to call ncclGroupStart() and ncclGroupEnd() Kawrakow 2025-12-19 17:34:48 +00:00
  • 0d552e9b38 WIP: it blocks on ncclAllReduce Kawrakow 2025-12-19 17:13:22 +00:00
  • 158f2f03f1 Add GGML_OP_REDUCE Kawrakow 2025-12-19 08:27:24 +00:00
  • ea798acd10 nccl: initial setep (cmake find if installed, initialize) Kawrakow 2025-12-19 07:10:02 +00:00
  • 21fc9322f9 cuda: set device to src device before p2p copy (#1073) Kawrakow 2025-12-17 12:50:34 +01:00
  • ecabd6acf7 cuda: set device to src device before p2p copy (#1073) Kawrakow 2025-12-17 12:50:34 +01:00
  • 64908da772 cuda: set device to src device before p2p copy ik/p2p_cpy_set_device Kawrakow 2025-12-17 11:43:36 +00:00
  • 7bb79eff48 add split-mode-graph-scheduling parameter (#1068) Nexes the Elder 2025-12-17 07:58:19 +01:00
  • d1dd45b4b9 add split-mode-graph-scheduling parameter (#1068) Nexes the Elder 2025-12-17 07:58:19 +01:00
  • 51eea5715f Better PP performance with split mode "graph" and 3+ GPUs (#1069) Kawrakow 2025-12-17 07:40:25 +01:00
  • 5585ac2aa8 Better PP performance with split mode "graph" and 3+ GPUs (#1069) Kawrakow 2025-12-17 07:40:25 +01:00
  • 0864655a72 Disable split scheduling with tensor overrides ik/better_graph_pp Iwan Kawrakow 2025-12-17 06:38:18 +00:00
  • 7a7029ec99 OK, this seems to be working Iwan Kawrakow 2025-12-16 18:40:53 +00:00
  • 2fcf407ab3 Nah, it is not working Iwan Kawrakow 2025-12-16 18:06:30 +00:00
  • f44844b328 Looks like with this change it is working with tensor overrides Iwan Kawrakow 2025-12-16 17:41:04 +00:00
  • 58ef9d608e Fix bug and cleanup Iwan Kawrakow 2025-12-16 13:25:36 +00:00
  • ec2ba592b5 Command line option to set max. extra VRAM that the scheduler can use Iwan Kawrakow 2025-12-16 06:35:06 +00:00
  • 5235c8b3e4 This should do the trick for PP Iwan Kawrakow 2025-12-16 06:07:23 +00:00
  • 75de0528c3 Much better TG speed with split mode "graph" (#1067) Kawrakow 2025-12-16 19:48:20 +01:00
  • 8ccceff4e9 Much better TG speed with split mode "graph" (#1067) Kawrakow 2025-12-16 19:48:20 +01:00
  • a750d3aa03 Fix log issue for llama-cli (#1071) firecoperana 2025-12-16 11:12:16 -06:00
  • 756c3f8f43 Fix log issue for llama-cli (#1071) firecoperana 2025-12-16 11:12:16 -06:00
  • 67383c7e3c Add back the fix for Kimi-K2 tool-call parsing issues (#1070) firecoperana 2025-12-16 07:44:47 -06:00
  • 269cc761db Add back the fix for Kimi-K2 tool-call parsing issues (#1070) firecoperana 2025-12-16 07:44:47 -06:00
  • 5a731064e6 Much better TG speed with split mode "graph" ik/better_graph_tg Kawrakow 2025-12-15 13:52:43 +00:00
  • 090f354d33 Refactor chat and server file (#1062) firecoperana 2025-12-15 01:27:20 -06:00
  • 0e91b89cd3 Refactor chat and server file (#1062) firecoperana 2025-12-15 01:27:20 -06:00
  • 0a36cea555 Use actual active number of layers when preparing splits (#1065) Kawrakow 2025-12-14 07:44:13 +01:00
  • 7b03c9dcef Use actual active number of layers when preparing splits (#1065) Kawrakow 2025-12-14 07:44:13 +01:00
  • 664a529332 Use actual active number of layers when preparing splits ik/ignore_nextn_layers Kawrakow 2025-12-14 06:41:41 +00:00
  • f90d1fdd06 Split mode "graph" for Cohere2 (#1061) Kawrakow 2025-12-13 20:30:08 +01:00
  • d97a6de34d Split mode "graph" for Cohere2 (#1061) Kawrakow 2025-12-13 20:30:08 +01:00
  • f81c0b7fa0 WIP ik/cohere2_sm_graph Iwan Kawrakow 2025-12-13 16:45:55 +00:00
  • 0a4e12f997 Equal split Iwan Kawrakow 2025-12-13 15:50:04 +00:00
  • d11def5ceb WIP Iwan Kawrakow 2025-12-13 14:52:23 +00:00
  • 81fc5e3f08 To not lose this again Iwan Kawrakow 2025-12-13 08:14:08 +00:00
  • f537d49928 This is better for PP: 600 t/s -> 700 t/s Iwan Kawrakow 2025-12-13 07:38:57 +00:00
  • 117eaf9c9e Apply f_logit_scale before mul mat with output tensor Iwan Kawrakow 2025-12-12 15:45:56 +00:00
  • 3100f03770 Better Iwan Kawrakow 2025-12-12 15:39:14 +00:00
  • 328c3ff5e0 This works and TG is descent, but PP is low Iwan Kawrakow 2025-12-12 14:59:33 +00:00
  • 844a8b0bfa Fix sync logic (#1064) Kawrakow 2025-12-13 18:40:49 +01:00
  • 5645be6cfc Fix sync logic (#1064) Kawrakow 2025-12-13 18:40:49 +01:00
  • d82ed383ce Fix sync logic ik/fix_sync_logic Iwan Kawrakow 2025-12-13 17:39:42 +00:00
  • 2e04b7cbef Undo sync reduction (#1063) Kawrakow 2025-12-13 16:58:32 +01:00
  • f667bd58b0 Undo sync reduction (#1063) Kawrakow 2025-12-13 16:58:32 +01:00
  • 72af525c9f Undo sync reduction ik/undo_sync_reduction Iwan Kawrakow 2025-12-13 15:57:07 +00:00
  • 093cc7c380 Do not use split mode graph scheduling if there are tensor overrides (#1060) Kawrakow 2025-12-12 14:48:38 +01:00
  • df02c39650 Do not use split mode graph scheduling if there are tensor overrides (#1060) Kawrakow 2025-12-12 14:48:38 +01:00
  • 082545b3f0 Do not use split mode graph scheduling if there are tensor overrides ik/undo_1049_if_tensor_overrides Iwan Kawrakow 2025-12-12 13:36:02 +00:00
  • b3a19a6f37 Fix overflow in offset calculation in mmq (#1059) Kawrakow 2025-12-12 14:31:06 +01:00
  • cc14d4a3cc Fix overflow in offset calculation in mmq (#1059) Kawrakow 2025-12-12 14:31:06 +01:00
  • 50fbde85dc Fix overflow in offset calculation in mmq ik/fix_mmq_overflow Iwan Kawrakow 2025-12-12 13:22:02 +00:00
  • 53fb7a4118 Be able to enable or disable P2P via command line argument (#1058) Kawrakow 2025-12-12 13:36:42 +01:00
  • b74fb479af Be able to enable or disable P2P via command line argument (#1058) Kawrakow 2025-12-12 13:36:42 +01:00
  • f65fefa36c Slightly faster TG for split mode "graph" (#1057) Kawrakow 2025-12-12 07:54:37 +01:00
  • 0698501ae2 Slightly faster TG for split mode "graph" (#1057) Kawrakow 2025-12-12 07:54:37 +01:00
  • 643cccd2c8 This is better ik/sm_graph_rearrange Iwan Kawrakow 2025-12-12 06:23:39 +00:00
  • ca1e7070f6 Be able to enable or disable P2P via command line argument ik/disable_or_enable_p2p Iwan Kawrakow 2025-12-11 17:46:54 +00:00
  • 07ae4b1ef0 Separate graph compute implementation for split mode graph Iwan Kawrakow 2025-12-11 16:53:54 +00:00
  • e6603ec882 Rearrange graph nodes Iwan Kawrakow 2025-12-11 16:12:55 +00:00
  • bf03f63c34 Fix #1055 (#1056) Kawrakow 2025-12-11 14:44:32 +01:00
  • 6a0e72aeae Fix #1055 (#1056) Kawrakow 2025-12-11 14:44:32 +01:00
  • e094f32467 Fix #1055 ik/fix_1055 Iwan Kawrakow 2025-12-11 13:26:41 +00:00
  • 37e41d22dc enable peer access (NVlink) (#1050) abc-nix 2025-12-11 07:31:56 +00:00
  • 0feb046e6b enable peer access (NVlink) (#1050) abc-nix 2025-12-11 07:31:56 +00:00
  • 279b4c44fc Fix the fix (#1054) Kawrakow 2025-12-11 08:05:33 +01:00
  • 59dba9f778 Fix the fix (#1054) Kawrakow 2025-12-11 08:05:33 +01:00
  • b41b17943d Fix the fix ik/fix_the_fix Iwan Kawrakow 2025-12-11 09:03:52 +02:00
  • 22863cf9c9 Be able to set a max. number of GPUs to be used in split mode graph (#1051) Kawrakow 2025-12-11 07:22:53 +01:00
  • 9484d150d8 Be able to set a max. number of GPUs to be used in split mode graph (#1051) Kawrakow 2025-12-11 07:22:53 +01:00
  • c953b47266 Be able to set a max. number of GPUs to be used in split mode graph ik/sm_graph_max_gpu Iwan Kawrakow 2025-12-10 15:55:45 +02:00
  • a2efa22f10 Fix llama-bench - missing buffer override comparison operator (#1053) Kawrakow 2025-12-11 07:21:06 +01:00
  • 6a5a707ac0 Fix llama-bench - missing buffer override comparison operator (#1053) Kawrakow 2025-12-11 07:21:06 +01:00
  • b37fafdc39 Fix llama-bench - missing buffer override comparison operator ik/fix_bench_compile Iwan Kawrakow 2025-12-11 08:18:45 +02:00
  • 02206cff46 Reduce back-end syncs (#1049) Kawrakow 2025-12-11 07:04:44 +01:00
  • 00d939c811 Reduce back-end syncs (#1049) Kawrakow 2025-12-11 07:04:44 +01:00
  • e61cd0303a QoL/bugfixes for llama-bench (#1052) i4TsU 2025-12-11 16:04:15 +10:00
  • 62f907c663 QoL/bugfixes for llama-bench (#1052) i4TsU 2025-12-11 16:04:15 +10:00
  • b0cc63bcdf Another attempt for sm graph ik/sm_graph_sync Kawrakow 2025-12-09 19:30:06 +00:00
  • a2f5614529 Try to split offloaded MoE up/gate up ik/try_split_offloaded_moe_up_gate Kawrakow 2025-12-09 10:09:04 +00:00
  • ccf72a0e46 Also this ik/backend_reduce_syncs Kawrakow 2025-12-09 06:36:31 +00:00
  • d0fec69966 Reduce backend synchronization calls Kawrakow 2025-12-09 06:19:50 +00:00
  • 53f693a708 KV cache read/write for split mode "graph" (#1048) Kawrakow 2025-12-09 06:50:53 +01:00
  • 5fe3979951 KV cache read/write for split mode "graph" (#1048) Kawrakow 2025-12-09 06:50:53 +01:00
  • c83d2fd335 WIP ik/split_graph_2 Kawrakow 2025-12-08 15:44:53 +00:00
  • be8e7057b3 Handle split cache (read) ik/handle_split_cache Kawrakow 2025-12-08 10:55:35 +02:00
  • 1e50392cd0 Handle split cache (write) Kawrakow 2025-12-08 10:22:38 +02:00
  • 808ce4907c Unroll for loop for repacked BF16 MATMUL (#1047) Djip007 2025-12-08 06:09:45 +01:00
  • 5669d39036 Unroll for loop for repacked BF16 MATMUL (#1047) Djip007 2025-12-08 06:09:45 +01:00
  • 66f21fb174 WIP - factor out split ffn Kawrakow 2025-12-06 15:47:10 +00:00
  • 22ac19958f WIP - factor out split attention Kawrakow 2025-12-06 09:44:52 +00:00
  • c9fcfb9a7a Fix annoying compiler warnings (#1042) Kawrakow 2025-12-06 09:59:07 +01:00
  • 2f645f2579 Fix annoying compiler warnings (#1042) Kawrakow 2025-12-06 09:59:07 +01:00
  • 0e683f24ad Fix annoying compiler warnings ik/fix_annoying_warnings Kawrakow 2025-12-06 08:57:50 +00:00
  • 87f6943e4b Automatically disable CUDA graphs for split mode "graph" (#1040) Kawrakow 2025-12-06 07:38:02 +01:00
  • e02b71f89e Automatically disable CUDA graphs for split mode "graph" (#1040) Kawrakow 2025-12-06 07:38:02 +01:00
  • a4da6e298a Automatically disable CUDA graphs for split mode "graph" ik/sm_graph_disable_cuda_graphs Kawrakow 2025-12-05 17:00:58 +00:00
  • a3737f4296 CUDA: set current device in compute_forward (#1039) Kawrakow 2025-12-05 16:47:50 +01:00