Commit Graph

  • c23a17b6fe cuda: use better block sizes for rms_norm (#845) Kawrakow 2025-10-21 08:12:48 +03:00
  • f5571e241e cuda: use better block sizes for rms_norm (#845) Kawrakow 2025-10-21 08:12:48 +03:00
  • 5adf63db3b Remove forgotten printf ik/rms_block_size Iwan Kawrakow 2025-10-21 08:07:02 +03:00
  • 73b50634e9 Add logs to try debugging #849 ik/debug_849 Iwan Kawrakow 2025-10-21 08:02:29 +03:00
  • c0582b4056 Minor Iwan Kawrakow 2025-10-20 17:54:08 +03:00
  • 0a9752db6c cuda: use better block sizes for rms_norm Iwan Kawrakow 2025-10-20 16:12:13 +03:00
  • 06ad8d1b2d Fix PR #842 (#844) Kawrakow 2025-10-20 11:35:57 +03:00
  • 5ae87f6cdf Fix PR #842 (#844) Kawrakow 2025-10-20 11:35:57 +03:00
  • 1c2b30c88a Fix PR #842 ik/fix_pr_842 Iwan Kawrakow 2025-10-20 11:35:08 +03:00
  • 1f072ab135 Do not allocate KV cache for unused layers (#843) Kawrakow 2025-10-20 10:09:39 +03:00
  • 22540cee60 Do not allocate KV cache for unused layers (#843) Kawrakow 2025-10-20 10:09:39 +03:00
  • acb0bc63fc Do not apply experts weight scale if it is 1 ik/no_KV_for_unused_layers Iwan Kawrakow 2025-10-20 08:21:21 +03:00
  • 599c812f12 Do not allocate KV cache for unused layers Iwan Kawrakow 2025-10-20 08:35:25 +03:00
  • 36f9601e8d Make ooae on by default and add to llama-bench (#842) Kawrakow 2025-10-20 08:32:41 +03:00
  • 1789de5994 Make ooae on by default and add to llama-bench (#842) Kawrakow 2025-10-20 08:32:41 +03:00
  • 83ab55bd5b Make ooae on by default and add to llama-bench ik/ooae_on_by_default Iwan Kawrakow 2025-10-20 07:29:58 +03:00
  • 0c050638b6 Change --n-cpu-moe to not keep expert biases on CPU (#841) Kawrakow 2025-10-19 19:03:03 +03:00
  • 2f5dae22e1 Change --n-cpu-moe to not keep expert biases on CPU (#841) Kawrakow 2025-10-19 19:03:03 +03:00
  • 7a41b3b1f5 Various fused ops around expert selection (#840) Kawrakow 2025-10-19 19:02:46 +03:00
  • 28d3e63805 Various fused ops around expert selection (#840) Kawrakow 2025-10-19 19:02:46 +03:00
  • 1d70b89d35 Also fuse sum_rows and div ik/fused_bailingmoev2 Iwan Kawrakow 2025-10-19 18:04:13 +03:00
  • 58d8b2231b Also for --cpu-moe ik/n_cpu_moe Iwan Kawrakow 2025-10-19 15:32:49 +03:00
  • d7319055de Change --n-cpu-moe to not keep expert biases ion CPU Iwan Kawrakow 2025-10-19 15:21:43 +03:00
  • 0fb9d4963f cpu: turn off the openai topk fusing for now Iwan Kawrakow 2025-10-19 13:11:35 +03:00
  • b79aad9d07 Fuse topk+view+get_rows+reshape+softmax (CUDA) Iwan Kawrakow 2025-10-19 13:00:37 +03:00
  • c8ed454564 Fuse topk+view+get_rows+reshape+softmax (CPU) Iwan Kawrakow 2025-10-19 11:45:10 +03:00
  • 18d9f4fc4d Fuse sigmoid+add+topk+get_rows (CPU) Iwan Kawrakow 2025-10-19 10:03:51 +03:00
  • 8fe2bb927a Fuse sigmoid+add+topk+get_rows (CUDA) Iwan Kawrakow 2025-10-19 09:13:34 +03:00
  • f3ff1a5c48 Minor Iwan Kawrakow 2025-10-19 07:21:41 +03:00
  • 2c66dc86fc Fix CPU + CUDA Iwan Kawrakow 2025-10-18 14:55:30 +03:00
  • 8f5f93e6b1 Fuse sigmoid+add+grouped_topk+get_rows (CPU) Iwan Kawrakow 2025-10-18 10:09:32 +03:00
  • 1dcc044134 Grouped expert routing (CUDA) (#838) Kawrakow 2025-10-18 07:22:35 +03:00
  • 747f411da5 Grouped expert routing (CUDA) (#838) Kawrakow 2025-10-18 07:22:35 +03:00
  • b2d3dc7235 This is very slightly better ik/cuda_grouped_topk Iwan Kawrakow 2025-10-17 19:27:30 +03:00
  • 2bcbf9f81e cuda: grouped top_k Iwan Kawrakow 2025-10-17 18:38:53 +03:00
  • fa6fb271f3 WIP Iwan Kawrakow 2025-10-17 18:07:03 +03:00
  • 32540ac619 Ling-1T convert fixup (#837) ubergarm 2025-10-17 00:52:31 -04:00
  • 2f951a8ab5 Ling-1T convert fixup (#837) ubergarm 2025-10-17 00:52:31 -04:00
  • cde642e591 Grouped expert routing (CPU only) (#836) Kawrakow 2025-10-16 14:57:02 +03:00
  • dbfd151594 Grouped expert routing (CPU only) (#836) Kawrakow 2025-10-16 14:57:02 +03:00
  • 2c43a989e0 Merge remote-tracking branch 'origin/main' into ik/try_grouped_topk_playing1 ik/try_grouped_topk_playing1 Iwan Kawrakow 2025-10-16 11:33:56 +03:00
  • e66d307e13 Better argsort (CPU) (#835) Kawrakow 2025-10-16 11:31:03 +03:00
  • ecf8f931ea Better argsort (CPU) (#835) Kawrakow 2025-10-16 11:31:03 +03:00
  • b386d5d063 Add grouped expert routing option to llama-bench Iwan Kawrakow 2025-10-16 11:21:44 +03:00
  • 3683e50660 Add command line option to enable grouped expert routing Iwan Kawrakow 2025-10-16 10:54:14 +03:00
  • c30c35b007 Working merged grouped top_k (CPU) Iwan Kawrakow 2025-10-16 10:32:53 +03:00
  • ba3e1818a6 Trying to merge, something is not right Iwan Kawrakow 2025-10-16 09:47:33 +03:00
  • 3b8cb3beeb Cleanup Iwan Kawrakow 2025-10-16 09:22:37 +03:00
  • bc656aaa5d This seems to do the trick for grouped experts routing Iwan Kawrakow 2025-10-16 09:03:47 +03:00
  • 7dd8d9c4c1 Minor ik/cpu_argsort Iwan Kawrakow 2025-10-15 18:08:42 +03:00
  • ffb3932300 Attemt at grouped topk Iwan Kawrakow 2025-10-15 17:29:10 +03:00
  • 5118036239 Better argsort (CPU) Iwan Kawrakow 2025-10-15 16:04:24 +03:00
  • f7adde1043 Adding Ling/Ring (a.k.a., Bailing-MoE2) support (#833) Kawrakow 2025-10-15 14:20:40 +03:00
  • 9d364b88ba Adding Ling/Ring (a.k.a., Bailing-MoE2) support (#833) Kawrakow 2025-10-15 14:20:40 +03:00
  • e8a705a153 Bits and pieces ik/bailingmoe2 Iwan Kawrakow 2025-10-15 14:07:03 +03:00
  • 7705c5edca WIP Iwan Kawrakow 2025-10-15 09:36:14 +03:00
  • 2bc3fd32e7 BailingMoE2 conversion Iwan Kawrakow 2025-10-14 19:03:23 +03:00
  • 8f014df83f Add expert group selection (not working, so turned off) Iwan Kawrakow 2025-10-14 17:41:18 +03:00
  • 6dc8c79d81 Adding Ling/Ring (a.k.a., Bailing-MoE2) Iwan Kawrakow 2025-10-14 16:24:16 +03:00
  • ba9fefb73d gpt-oss: duplicate experts biases when necessary (#829) Kawrakow 2025-10-14 14:38:40 +03:00
  • 8d0d01a593 gpt-oss: duplicate experts biases when necessary (#829) Kawrakow 2025-10-14 14:38:40 +03:00
  • 2b71974af9 Fix incomplete utf-8 characters in streaming text completions (#810) Viktor Ivakin 2025-10-13 16:25:29 +03:00
  • 41bdd86555 Fix incomplete utf-8 characters in streaming text completions (#810) Viktor Ivakin 2025-10-13 16:25:29 +03:00
  • c4253f61e7 gpt-oss: duplicate experts biases when necessary ik/dup_experts_bias Iwan Kawrakow 2025-10-13 14:16:46 +03:00
  • 4e24d48e63 Attention mask tweaks for better long context performance (#825) Kawrakow 2025-10-13 14:01:11 +03:00
  • 9724ea9213 Attention mask tweaks for better long context performance (#825) Kawrakow 2025-10-13 14:01:11 +03:00
  • 21a0bfb1c0 Fix PATH_MAX not defined on Windows (#828) Kawrakow 2025-10-13 09:25:57 +03:00
  • 1db0c490be Fix PATH_MAX not defined on Windows (#828) Kawrakow 2025-10-13 09:25:57 +03:00
  • 4f738c9713 Fix PATH_MAX not defined on Windows ik/fix_827 Iwan Kawrakow 2025-10-13 09:24:40 +03:00
  • 91a798c833 Reduce KQ mask padding to 16 ik/mask_mt Iwan Kawrakow 2025-10-12 15:26:55 +03:00
  • 262cb8cc6d WIP Iwan Kawrakow 2025-10-12 15:02:39 +03:00
  • 95419ed393 Whith FA on, create mask as f16 directly Iwan Kawrakow 2025-10-12 14:23:05 +03:00
  • 9b02dd0405 Parallelize mask Iwan Kawrakow 2025-10-12 13:15:16 +03:00
  • 78409c95ff Fix performance regression introduced in #823 (#826) Kawrakow 2025-10-13 08:09:55 +03:00
  • 0030bc89c9 Fix performance regression introduced in #823 (#826) Kawrakow 2025-10-13 08:09:55 +03:00
  • e686c98385 Fix performance regression introduced in #823 ik/fix_perf_regression Iwan Kawrakow 2025-10-13 08:02:06 +03:00
  • 764eefd1bc Enable and clean up compiler warnings in src (#824) Kawrakow 2025-10-11 16:01:13 +03:00
  • 0ad1d34090 Enable and clean up compiler warnings in src (#824) Kawrakow 2025-10-11 16:01:13 +03:00
  • 8116e91a92 All warnings handled ik/llama_warnings Iwan Kawrakow 2025-10-11 15:54:18 +03:00
  • 91dd381140 WIP: enable and clean up warnings in src Iwan Kawrakow 2025-10-11 15:24:38 +03:00
  • 4daff01b39 Refactor file llama.cpp (#823) Kawrakow 2025-10-11 11:35:20 +03:00
  • 335a1f9b71 Refactor file llama.cpp (#823) Kawrakow 2025-10-11 11:35:20 +03:00
  • 463220a879 load -> create ik/refactor_llama.cpp Iwan Kawrakow 2025-10-11 11:30:52 +03:00
  • 4b71c16a75 We are now at 6 seconds to build the src folder Iwan Kawrakow 2025-10-11 10:13:07 +03:00
  • ca73a21a0e hparams loading Iwan Kawrakow 2025-10-10 13:51:47 +03:00
  • 6ed8f7d7e0 All graph building is now in llm-build-context.cpp Iwan Kawrakow 2025-10-10 13:29:41 +03:00
  • e8e8ac1e9f arch names Iwan Kawrakow 2025-10-10 09:25:49 +03:00
  • 24c0a6e36b llama_quantize Iwan Kawrakow 2025-10-10 09:09:43 +03:00
  • 431afecd27 LLM_TN Iwan Kawrakow 2025-10-10 08:41:48 +03:00
  • 37bf216d21 llama_build_context Iwan Kawrakow 2025-10-10 08:33:28 +03:00
  • 0582186c66 llama_model and llama_hparams Iwan Kawrakow 2025-10-09 18:15:32 +03:00
  • 51486dc3d4 Debug #733 ik/debug_issue_733 Iwan Kawrakow 2025-10-07 17:53:56 +03:00
  • 23275ac066 Remove duplicate 99% KLD output, add additional percentiles to match mainline (#817) AesSedai 2025-10-04 22:13:32 -07:00
  • f649e36a61 Remove duplicate 99% KLD output, add additional percentiles to match mainline (#817) AesSedai 2025-10-04 22:13:32 -07:00
  • 5a633bb0e9 Mark some multi-prediction tensors as not required. (#814) Downtown-Case 2025-10-01 13:37:31 -05:00
  • 6051ba25ee Mark some multi-prediction tensors as not required. (#814) Downtown-Case 2025-10-01 13:37:31 -05:00
  • 475223079c Attempt to fix AVX2 FA (#807) Kawrakow 2025-09-30 08:06:53 +02:00
  • e94d1a92a5 Attempt to fix AVX2 FA (#807) Kawrakow 2025-09-30 08:06:53 +02:00
  • f4b750a430 Attempt to fix AVX2 FA ik/try_fix_avx2_fa Iwan Kawrakow 2025-09-29 13:20:11 +03:00
  • 9932e6b102 Fix gemma3 vision (#803) Kawrakow 2025-09-27 11:15:32 +02:00