Commit Graph

  • 08a0da389c Better VRAM utilization strategy for split mode graph (#1126) Kawrakow 2026-01-09 13:36:02 +02:00
  • d14c479090 Better VRAM utilization strategy for split mode graph (#1126) Kawrakow 2026-01-09 13:36:02 +02:00
  • ae547b8502 Fix assert when --max-gpu is less than available GPUs ik/graph_better_splits Iwan Kawrakow 2026-01-09 11:15:05 +00:00
  • 0c3eedab56 Better VRAM utilization strategy for split mode graph Iwan Kawrakow 2026-01-09 09:16:46 +00:00
  • 8725d110d2 Fix data races in the reduce op (#1124) Kawrakow 2026-01-09 10:34:58 +02:00
  • a58a6a8a07 Fix data races in the reduce op (#1124) Kawrakow 2026-01-09 10:34:58 +02:00
  • d35cf5a92d Fix data races in the reduce op ik/fix_reduce_race Iwan Kawrakow 2026-01-09 08:32:00 +00:00
  • eaf2e1c15a Split mode "graph" for Ernie-4.5-MoE (#1121) Kawrakow 2026-01-08 16:46:41 +02:00
  • 145e4f4ed9 Split mode "graph" for Ernie-4.5-MoE (#1121) Kawrakow 2026-01-08 16:46:41 +02:00
  • 37caf11f2c Cleanup ik/ernie_graph Kawrakow 2026-01-08 08:18:34 +00:00
  • 8e1a625aaa Ernie-4.5-MoE split mode graph Kawrakow 2026-01-08 08:08:46 +00:00
  • 0c2d924e94 Do not abort on NCCL initizalization failure (#1120) Kawrakow 2026-01-08 09:19:50 +02:00
  • 0456aa47d3 Do not abort on NCCL initizalization failure (#1120) Kawrakow 2026-01-08 09:19:50 +02:00
  • 8308320bca Do not abort on NCCL initizalization failure ik/dont_abort_on_nccl_init_failure Iwan Kawrakow 2026-01-08 07:16:23 +00:00
  • 5ef98f8b0f Split mode "graph" for GPT-OSS (#1118) Kawrakow 2026-01-08 09:14:15 +02:00
  • d581d75537 Split mode "graph" for GPT-OSS (#1118) Kawrakow 2026-01-08 09:14:15 +02:00
  • 646fe94085 Force split_mode_f16 to false ik/gpt_oss_graph Iwan Kawrakow 2026-01-07 14:58:59 +00:00
  • 3cfb1ad6d8 Split mode "graph" for GPT-OSS Iwan Kawrakow 2026-01-07 14:42:50 +00:00
  • 9c1bef35e8 CUDA: compress-mode size (#1110) firecoperana 2026-01-07 10:33:17 -06:00
  • 1b24192873 CUDA: compress-mode size (#1110) firecoperana 2026-01-07 10:33:17 -06:00
  • 99fbd84971 Split mode "graph" for Hunyuan-MoE (#1116) Kawrakow 2026-01-07 13:38:08 +02:00
  • 8e9d66ce76 Split mode "graph" for Hunyuan-MoE (#1116) Kawrakow 2026-01-07 13:38:08 +02:00
  • edd56b1bf7 Split mode "graph" for Hunyuan-MoE ik/hunyuan_graph Iwan Kawrakow 2026-01-07 09:12:46 +00:00
  • ab1616767b Enable up to 4 GPUs for Mimo2-Flash (#1115) Kawrakow 2026-01-07 09:40:29 +02:00
  • 3c9135344b Enable up to 4 GPUs for Mimo2-Flash (#1115) Kawrakow 2026-01-07 09:40:29 +02:00
  • a29f62fc50 Enable up to 4 GPUs for Mimo2-Flash ik/mimo2_4_gpus Iwan Kawrakow 2026-01-07 07:36:00 +00:00
  • a82dcbf3ee Fix ring reduction (#1114) Kawrakow 2026-01-07 08:01:31 +02:00
  • 6bf4ffe5b9 Fix ring reduction (#1114) Kawrakow 2026-01-07 08:01:31 +02:00
  • 10c531c8de Actually enable it ik/fix_ring_reduction Iwan Kawrakow 2026-01-07 05:55:10 +00:00
  • 5f379c3098 Fix ring reduction Iwan Kawrakow 2026-01-07 05:34:59 +00:00
  • 54a513768c Disable ring reduction for now (#1112) Kawrakow 2026-01-06 15:40:50 +02:00
  • 8e9d2c328f Disable ring reduction for now (#1112) Kawrakow 2026-01-06 15:40:50 +02:00
  • 289aadb9d4 Disable ring reduction for now ik/reduce_race_quick_fix Iwan Kawrakow 2026-01-06 13:14:30 +00:00
  • 3c99284b67 Split mode 'graph' fpr Qwen3-VL (#1107) Kawrakow 2026-01-05 17:32:00 +02:00
  • d9236392cf Split mode 'graph' fpr Qwen3-VL (#1107) Kawrakow 2026-01-05 17:32:00 +02:00
  • b41f2c3ffe Split mode 'graph' fpr Qwen3-VL ik/qwen3vl_graph Kawrakow 2026-01-05 13:21:10 +00:00
  • 359cf817a9 Split mode graph for Qwen3 (#1106) Kawrakow 2026-01-05 14:31:36 +02:00
  • 218dcc5727 Split mode graph for Qwen3 (#1106) Kawrakow 2026-01-05 14:31:36 +02:00
  • a725f15d9d Split mode graph for Qwen3 ik/qwen3_graph Kawrakow 2026-01-05 08:00:30 +00:00
  • cac2b046f0 Graph parallel for Mimo-V2-Flash (#1105) Kawrakow 2026-01-05 09:58:54 +02:00
  • 419a397ce0 Graph parallel for Mimo-V2-Flash (#1105) Kawrakow 2026-01-05 09:58:54 +02:00
  • b586f89e50 Set max_gpu to 2 for Mimo2 ik/mimo2_graph Kawrakow 2026-01-05 08:49:17 +02:00
  • 066bf766d2 Cleanup Kawrakow 2026-01-05 08:42:02 +02:00
  • 9cf8c0cdde WIP Kawrakow 2025-12-29 09:52:17 +00:00
  • 9c866df62e Fix race in CUDA FA for head sizes 192/128 (#1104) Kawrakow 2026-01-05 08:21:07 +02:00
  • 385fc14110 Fix race in CUDA FA for head sizes 192/128 (#1104) Kawrakow 2026-01-05 08:21:07 +02:00
  • ae3498dabd Fix race in CUDA FA for head sizes 192/128 ik/fix_fa_192_128 Iwan Kawrakow 2026-01-05 08:17:10 +02:00
  • ab50c6cdcb Mimo-V2-Flash support (#1096) Kawrakow 2026-01-05 08:00:01 +02:00
  • 8a6622eb4f Mimo-V2-Flash support (#1096) Kawrakow 2026-01-05 08:00:01 +02:00
  • 56dceefd6b Fix windows build with CUDA (#1101) firecoperana 2026-01-04 23:59:23 -06:00
  • 1401326916 Fix windows build with CUDA (#1101) firecoperana 2026-01-04 23:59:23 -06:00
  • d7476a1b46 fix grammar for Kimi-K2 (#1103) hksdpc255 2026-01-05 16:57:25 +11:00
  • 3afd0600a1 fix grammar for Kimi-K2 (#1103) hksdpc255 2026-01-05 16:57:25 +11:00
  • 17a5a80946 Fix Windows build (#1097) Kawrakow 2025-12-29 14:18:27 +01:00
  • 5a206e3cef Fix Windows build (#1097) Kawrakow 2025-12-29 14:18:27 +01:00
  • ba0e88a5e3 Minor ik/mimo2 Iwan Kawrakow 2025-12-28 08:57:24 +00:00
  • 9b7d08eaa2 Fix quantized cache Iwan Kawrakow 2025-12-27 18:04:33 +00:00
  • 1e74cc9f94 Fix bug for head sizes not being the same Iwan Kawrakow 2025-12-27 17:16:49 +00:00
  • a95869636a Mimo-2 support Iwan Kawrakow 2025-12-27 15:12:07 +00:00
  • f878adbe90 Turn on graph reuse by default (#1094) Kawrakow 2025-12-27 08:27:16 +01:00
  • fc3be34ead Turn on graph reuse by default (#1094) Kawrakow 2025-12-27 08:27:16 +01:00
  • bf3ff8ec41 Turn on graph reuse by default ik/graph_reuse_on Iwan Kawrakow 2025-12-27 07:22:46 +00:00
  • 519405dc97 Async compute graph evaluation (2 or more GPUs) (#1089) Kawrakow 2025-12-27 08:18:06 +01:00
  • 2fe098e938 Async compute graph evaluation (2 or more GPUs) (#1089) Kawrakow 2025-12-27 08:18:06 +01:00
  • 29d323117c Command line option to turn on async. Set to false by defualt for now ik/nccl3_async Iwan Kawrakow 2025-12-27 06:24:01 +00:00
  • 7146de451d Be more careful with having set the device before using a stream (#1093) Kawrakow 2025-12-26 19:19:41 +01:00
  • f7923739cc Be more careful with having set the device before using a stream (#1093) Kawrakow 2025-12-26 19:19:41 +01:00
  • 0e059879b7 Be more careful with having set the device before using a stream ik/more_set_device Iwan Kawrakow 2025-12-26 18:17:16 +00:00
  • 07759f172c Be more careful with having set the device before using a stream Iwan Kawrakow 2025-12-26 18:17:16 +00:00
  • b79bf6c0ef Merge remote-tracking branch 'origin/main' into ik/nccl3_async Iwan Kawrakow 2025-12-26 16:36:25 +00:00
  • 8687fca3ff Graph parallel: better PP performance for 3 and more GPUs (#1092) Kawrakow 2025-12-26 17:35:27 +01:00
  • 59d0022991 Graph parallel: better PP performance for 3 and more GPUs (#1092) Kawrakow 2025-12-26 17:35:27 +01:00
  • f109274859 Graph parallel: better PP performance for 3 and more GPUs ik/ring_reduce Iwan Kawrakow 2025-12-26 15:57:19 +00:00
  • 443445579f Set omp max active levels Iwan Kawrakow 2025-12-26 05:09:27 +00:00
  • 072cd216f4 Do not use OpenMP if there are tensor overrides Iwan Kawrakow 2025-12-25 17:06:46 +00:00
  • 197de25020 Use OpenMP if available Iwan Kawrakow 2025-12-25 15:20:37 +00:00
  • 4707b09137 Merge remote-tracking branch 'origin/main' into ik/nccl3_async Iwan Kawrakow 2025-12-25 07:57:23 +00:00
  • a2ffceb235 Fix split mode graph when p2p is not enabled (#1091) Kawrakow 2025-12-25 08:55:08 +01:00
  • 03ed5f7096 Fix split mode graph when p2p is not enabled (#1091) Kawrakow 2025-12-25 08:55:08 +01:00
  • d2f52ec104 Fix split mode graph when p2p is not enabled ik/fix_no_p2p_case Iwan Kawrakow 2025-12-25 07:52:38 +00:00
  • 3be3649db9 Reduce add improvemens without NCCL (#1088) Kawrakow 2025-12-25 08:44:24 +01:00
  • 41a8d05420 Reduce add improvemens without NCCL (#1088) Kawrakow 2025-12-25 08:44:24 +01:00
  • 6803cad2f3 Scheduler changes Iwan Kawrakow 2025-12-25 07:18:51 +00:00
  • 930c9f7006 Only do async for 4 or more backends Iwan Kawrakow 2025-12-24 16:15:50 +00:00
  • 16d0dd794c Merge remote-tracking branch 'origin/main' into ik/nccl3_async Iwan Kawrakow 2025-12-24 15:28:13 +00:00
  • 723e18bb98 Reduce add improvemens without NCCL ik/reduce_no_nccl Iwan Kawrakow 2025-12-24 13:34:17 +00:00
  • ada5cc1523 Fused norm (#1086) Kawrakow 2025-12-24 15:22:43 +01:00
  • fbb67fa2bd Fused norm (#1086) Kawrakow 2025-12-24 15:22:43 +01:00
  • 5e64235d4c Be able to set reduce op data type for split mode "graph" (#1087) Kawrakow 2025-12-24 14:01:29 +01:00
  • 1ace5b7526 Be able to set reduce op data type for split mode "graph" (#1087) Kawrakow 2025-12-24 14:01:29 +01:00
  • 903377bc34 Webui: improve scroll and bug fixes (#1082) firecoperana 2025-12-24 05:30:26 -06:00
  • 2421a7e12b Webui: improve scroll and bug fixes (#1082) firecoperana 2025-12-24 05:30:26 -06:00
  • c6a3903571 Be able to set reduce op data type for split mode "graph" ik/split_mode_f32 Iwan Kawrakow 2025-12-24 10:57:41 +00:00
  • 2de3a96510 Avoid computing the attention reduce op for cohere2 ik/fused_norm Iwan Kawrakow 2025-12-24 10:14:58 +00:00
  • dc1770b8df Adding fused_norm - same idea as fused_rms_norm Iwan Kawrakow 2025-12-24 08:18:56 +00:00
  • 0d7eb34185 Graph parallel: the next generation (#1080) Kawrakow 2025-12-24 08:31:48 +01:00
  • 1d7d0225a0 Graph parallel: the next generation (#1080) Kawrakow 2025-12-24 08:31:48 +01:00
  • ef30dd8834 This sync seems enough Iwan Kawrakow 2025-12-23 05:17:22 +00:00
  • dc28cadb65 Simple async Iwan Kawrakow 2025-12-22 18:43:13 +00:00
  • d4c23f1f89 OK, let's leave it in Iwan Kawrakow 2025-12-22 17:13:23 +00:00