Commit Graph

  • a9f37c2f80 Hopefully better Kawrakow 2026-01-19 07:42:57 +00:00
  • 61eccfcf0d More formatting Kawrakow 2026-01-18 15:49:29 +00:00
  • b2c9689762 Once at it, lets fix the formatting too Kawrakow 2026-01-18 15:01:46 +00:00
  • 6c65430257 A hopefully more efficient adaptive_p sampling Kawrakow 2026-01-18 14:56:30 +00:00
  • 0c0b6e4b8b Copy reduce result to other GPUs if necessary (#1156) Kawrakow 2026-01-19 08:40:26 +02:00
  • ae5c269371 More models ik/skip_get_rows Kawrakow 2026-01-18 13:37:40 +00:00
  • c7deb32142 For the output ops use the result of the split that ran on the main GPU Kawrakow 2026-01-18 12:53:34 +00:00
  • a26adbcf5d Avoid ggml_get_rows for TG Kawrakow 2026-01-18 11:31:35 +00:00
  • fb5c340e17 Copy reduce result to other GPUs if necessary ik/reduce_make_copies Kawrakow 2026-01-18 07:00:06 +00:00
  • 6dfbef27ec Adaptive p: bugfix + optimization + refactor (#1155) dungquixote42 2026-01-18 01:26:06 -05:00
  • d71a3ec315 Server: refactor and rename functions (#1151) firecoperana 2026-01-18 00:16:57 -06:00
  • 7024fdbc72 Additional graph reduce types for split mode graph (#1154) Kawrakow 2026-01-18 08:02:49 +02:00
  • 73b8fea90b This finally works ik/extra_reduce_types Kawrakow 2026-01-17 17:25:57 +00:00
  • 288a8cf842 WIP: add Q8_0 and BF16 as possible reduce types Kawrakow 2026-01-17 15:09:26 +00:00
  • ee463b079e Webui: add text completions and adaptive_p sampling (#1153) firecoperana 2026-01-17 00:37:07 -06:00
  • 709e1a5375 Fixing split mode graph with many GPUs (#1152) Kawrakow 2026-01-17 08:05:24 +02:00
  • c6c890e164 WIP - still deadlocking ik/try_fix_many_gpus_2 Kawrakow 2026-01-16 15:07:23 +00:00
  • 4730b3e1f0 printf cleanup ik/try_fix_many_gpus Kawrakow 2026-01-15 14:33:54 +00:00
  • 7878553ad6 Reenable OpenMP in scheduler async Kawrakow 2026-01-15 14:20:24 +00:00
  • d6e5fb00d6 WIP: this seems more stable Kawrakow 2026-01-15 13:29:27 +00:00
  • 99890edf7e Attempt to fix the many GPU issue in split mode graph Kawrakow 2026-01-15 08:45:52 +00:00
  • cb1063f6cd Fix experts/shared experts split (#1147) Kawrakow 2026-01-14 15:35:16 +02:00
  • e65782de67 Fix experts/shared experts split ik/fix_exp_shexp_split Kawrakow 2026-01-14 13:26:09 +00:00
  • 3a0b234669 Add context management to the MiroThinker template (simulate official agent behavior) (#1143) hksdpc255 2026-01-14 03:08:59 +11:00
  • 672df48ed1 server: keep logit bias unchanged when client does not set it (#1144) firecoperana 2026-01-13 10:08:09 -06:00
  • 0adff91363 Make adding tensor overrides to llama-bench table optional (#1141) Kawrakow 2026-01-13 11:08:13 +02:00
  • 4fd797c863 Make adding tensor overrides to llama-bench table optional ik/llama_bench_overrides Kawrakow 2026-01-13 08:55:38 +00:00
  • 9d9ed6a032 Add -sas, --scheduler-async to llama-bench (#1140) Kawrakow 2026-01-13 10:23:50 +02:00
  • 81c466835d Add -sas, --scheduler-async to llama-bench ik/llama_bench_sas Kawrakow 2026-01-13 08:21:44 +00:00
  • e1c4c4a495 Fix Anthropic Messages API (#1136) hksdpc255 2026-01-13 17:37:29 +11:00
  • 013831bba5 Fix compilation errors Kawrakow 2026-01-13 08:12:49 +02:00
  • 978202a754 Merge ffn_up and ffn_gate experts tensors (part 2) (#1139) Kawrakow 2026-01-13 08:07:52 +02:00
  • 54a1f68d32 Add chat parser for MiroThinker (#1138) hksdpc255 2026-01-13 17:07:12 +11:00
  • 1a461525d5 server: stop processing the prompt when client disconnects (#1134) firecoperana 2026-01-12 23:56:59 -06:00
  • d3e3ad40f9 Compiler warning and white space Kawrakow 2026-01-12 19:06:17 +02:00
  • a50bd821ec Also Qwen3VL-MoE ik/merge_up_gate_exps_3 Kawrakow 2026-01-12 18:52:15 +02:00
  • 0a18f1fadd All the others Iwan Kawrakow 2026-01-12 16:22:53 +00:00
  • c771666d04 We need to of course pass the merged tensor to build_ffn Iwan Kawrakow 2026-01-12 16:05:14 +00:00
  • 60ccbe7bcd Add ability to merge up+gate exps to more models Kawrakow 2026-01-12 17:00:09 +02:00
  • c03c2d7cc6 Merge ffn_up and ffn_gate experts tensors (#1137) Kawrakow 2026-01-12 18:30:53 +02:00
  • 5d0123313a All the others ik/merge_up_gate_exps_2 Iwan Kawrakow 2026-01-12 16:22:53 +00:00
  • ab1ec19151 We need to of course pass the merged tensor to build_ffn Iwan Kawrakow 2026-01-12 16:05:14 +00:00
  • aad40bcd2d Add ability to merge up+gate exps to more models Kawrakow 2026-01-12 17:00:09 +02:00
  • 905bca2e1c Cleanup ik/fuse_merge_up_gate_exps Kawrakow 2026-01-12 15:28:06 +02:00
  • bf0c6c57bb addOpenGLRunpath -> autoAddDriverRunpath in .devops/nix/package.nix (#1135) bndlfm 2026-01-12 07:16:37 -06:00
  • 74dc8aa99e Arghh, we need to increase the context size again Kawrakow 2026-01-12 14:59:15 +02:00
  • 9821ac7b9c When no bias, allow merging up/gate with tensor overrides Kawrakow 2026-01-12 13:33:26 +02:00
  • ec105a80bc Turn off merge_up_gate_exps if split mode graph Kawrakow 2026-01-12 13:00:50 +02:00
  • 7ad7d8339b Add merge up/gate command line parameter to llama-bench Kawrakow 2026-01-12 12:57:18 +02:00
  • 7671335ac9 Add command line option to merge experts up/gate Kawrakow 2026-01-12 12:49:18 +02:00
  • 80f2b090d5 Minor Kawrakow 2026-01-12 12:26:53 +02:00
  • c7ae5d4eeb WIP: TG seems to be working Kawrakow 2026-01-12 12:23:18 +02:00
  • 3a848fc48c WIP - Qwen3-MoE (and hopefully all others) working Kawrakow 2026-01-12 11:55:01 +02:00
  • 4e4fabf0b4 WIP Kawrakow 2026-01-12 11:39:50 +02:00
  • 3d9ee861f8 WIP - GPT-OSS working Kawrakow 2026-01-12 11:10:03 +02:00
  • 77bd2effa4 WIP - not working Kawrakow 2026-01-11 12:52:37 +02:00
  • 6ba5772b07 WIP - not working Kawrakow 2026-01-11 12:29:38 +02:00
  • 738dc60b78 We don't need these ik/try_authors Kawrakow 2026-01-10 15:31:06 +00:00
  • 1ee36144a8 WIP - something is wrong ik/bailingmoe2_graph Kawrakow 2026-01-10 13:17:22 +00:00
  • c7348f6f55 Fix mla = 0 (#1130) Kawrakow 2026-01-10 10:34:30 +02:00
  • d329029dde Fix mla = 0 ik/deepseek_mla0 Kawrakow 2026-01-10 08:27:57 +00:00
  • c7dba35702 Update AUTHORS (#1129) Kawrakow 2026-01-10 08:10:21 +02:00
  • 39e57c1b57 Update AUTHORS ik/update_authors Iwan Kawrakow 2026-01-10 08:09:34 +02:00
  • c03ee1a4d2 server: improve speed of speculative decoding (#1119) firecoperana 2026-01-10 00:01:22 -06:00
  • c1931663ad server: improve speed of speculative decoding (#1119) firecoperana 2026-01-10 00:01:22 -06:00
  • 52ad1c6421 Implement Adaptive-P Sampler (#1100) dungquixote42 2026-01-10 00:58:53 -05:00
  • 6695c6c945 Implement Adaptive-P Sampler (#1100) dungquixote42 2026-01-10 00:58:53 -05:00
  • dd3c3f72f2 Fix split mode graph for GPT-OSS with partial offload (#1128) Kawrakow 2026-01-10 07:57:43 +02:00
  • c91cf84c8f Fix split mode graph for GPT-OSS with partial offload (#1128) Kawrakow 2026-01-10 07:57:43 +02:00
  • 58f3784821 Fix split mode graph for GPT-OSS with partial offload ik/fix_gpt_oss_partial_offload Iwan Kawrakow 2026-01-09 16:57:30 +00:00
  • 08a0da389c Better VRAM utilization strategy for split mode graph (#1126) Kawrakow 2026-01-09 13:36:02 +02:00
  • d14c479090 Better VRAM utilization strategy for split mode graph (#1126) Kawrakow 2026-01-09 13:36:02 +02:00
  • ae547b8502 Fix assert when --max-gpu is less than available GPUs ik/graph_better_splits Iwan Kawrakow 2026-01-09 11:15:05 +00:00
  • 0c3eedab56 Better VRAM utilization strategy for split mode graph Iwan Kawrakow 2026-01-09 09:16:46 +00:00
  • 8725d110d2 Fix data races in the reduce op (#1124) Kawrakow 2026-01-09 10:34:58 +02:00
  • a58a6a8a07 Fix data races in the reduce op (#1124) Kawrakow 2026-01-09 10:34:58 +02:00
  • d35cf5a92d Fix data races in the reduce op ik/fix_reduce_race Iwan Kawrakow 2026-01-09 08:32:00 +00:00
  • eaf2e1c15a Split mode "graph" for Ernie-4.5-MoE (#1121) Kawrakow 2026-01-08 16:46:41 +02:00
  • 145e4f4ed9 Split mode "graph" for Ernie-4.5-MoE (#1121) Kawrakow 2026-01-08 16:46:41 +02:00
  • 37caf11f2c Cleanup ik/ernie_graph Kawrakow 2026-01-08 08:18:34 +00:00
  • 8e1a625aaa Ernie-4.5-MoE split mode graph Kawrakow 2026-01-08 08:08:46 +00:00
  • 0c2d924e94 Do not abort on NCCL initizalization failure (#1120) Kawrakow 2026-01-08 09:19:50 +02:00
  • 0456aa47d3 Do not abort on NCCL initizalization failure (#1120) Kawrakow 2026-01-08 09:19:50 +02:00
  • 8308320bca Do not abort on NCCL initizalization failure ik/dont_abort_on_nccl_init_failure Iwan Kawrakow 2026-01-08 07:16:23 +00:00
  • 5ef98f8b0f Split mode "graph" for GPT-OSS (#1118) Kawrakow 2026-01-08 09:14:15 +02:00
  • d581d75537 Split mode "graph" for GPT-OSS (#1118) Kawrakow 2026-01-08 09:14:15 +02:00
  • 646fe94085 Force split_mode_f16 to false ik/gpt_oss_graph Iwan Kawrakow 2026-01-07 14:58:59 +00:00
  • 3cfb1ad6d8 Split mode "graph" for GPT-OSS Iwan Kawrakow 2026-01-07 14:42:50 +00:00
  • 9c1bef35e8 CUDA: compress-mode size (#1110) firecoperana 2026-01-07 10:33:17 -06:00
  • 1b24192873 CUDA: compress-mode size (#1110) firecoperana 2026-01-07 10:33:17 -06:00
  • 99fbd84971 Split mode "graph" for Hunyuan-MoE (#1116) Kawrakow 2026-01-07 13:38:08 +02:00
  • 8e9d66ce76 Split mode "graph" for Hunyuan-MoE (#1116) Kawrakow 2026-01-07 13:38:08 +02:00
  • edd56b1bf7 Split mode "graph" for Hunyuan-MoE ik/hunyuan_graph Iwan Kawrakow 2026-01-07 09:12:46 +00:00
  • ab1616767b Enable up to 4 GPUs for Mimo2-Flash (#1115) Kawrakow 2026-01-07 09:40:29 +02:00
  • 3c9135344b Enable up to 4 GPUs for Mimo2-Flash (#1115) Kawrakow 2026-01-07 09:40:29 +02:00
  • a29f62fc50 Enable up to 4 GPUs for Mimo2-Flash ik/mimo2_4_gpus Iwan Kawrakow 2026-01-07 07:36:00 +00:00
  • a82dcbf3ee Fix ring reduction (#1114) Kawrakow 2026-01-07 08:01:31 +02:00
  • 6bf4ffe5b9 Fix ring reduction (#1114) Kawrakow 2026-01-07 08:01:31 +02:00
  • 10c531c8de Actually enable it ik/fix_ring_reduction Iwan Kawrakow 2026-01-07 05:55:10 +00:00
  • 5f379c3098 Fix ring reduction Iwan Kawrakow 2026-01-07 05:34:59 +00:00