Commit Graph

  • fcb0b472dd Fix kv cache save and load for GLM model (#965) firecoperana 2025-11-15 15:04:16 +00:00
  • 5ec0def0ef Fix compiler warnings (#963) firecoperana 2025-11-15 05:07:15 +00:00
  • f3db96e539 Fix compiler warnings (#963) firecoperana 2025-11-15 05:07:15 +00:00
  • bb358223cd server: cache prompt to host memory (#954) firecoperana 2025-11-14 16:40:13 +00:00
  • 0cb6dcc8c8 server: cache prompt to host memory (#954) firecoperana 2025-11-14 16:40:13 +00:00
  • 00dffb5e68 Add --chat-template-file to usage (#959) Kawrakow 2025-11-14 11:08:58 +02:00
  • 2642f48921 Add --chat-template-file to usage (#959) Kawrakow 2025-11-14 11:08:58 +02:00
  • ca03d07bb6 Add --chat-template-file to usage ik/add_jinja_file_help Iwan Kawrakow 2025-11-14 11:07:32 +02:00
  • 668c37d4cf DeepSeek: enable option to merge Q and K tensors (#941) Kawrakow 2025-11-14 08:23:04 +02:00
  • 9e2b21fbc9 DeepSeek: enable option to merge Q and K tensors (#941) Kawrakow 2025-11-14 08:23:04 +02:00
  • 177b5d2a47 Fix cuda init error in rpc (#957) firecoperana 2025-11-14 04:59:54 +00:00
  • ba1a753106 Fix cuda init error in rpc (#957) firecoperana 2025-11-14 04:59:54 +00:00
  • c64e3e3482 Fix fused up+gate when mmq is not supported (#952) Kawrakow 2025-11-14 06:59:27 +02:00
  • 716d318fd8 Fix fused up+gate when mmq is not supported (#952) Kawrakow 2025-11-14 06:59:27 +02:00
  • a1f60b3535 Add missing AVX512 operators for MSVC (#948) Kawrakow 2025-11-14 06:58:51 +02:00
  • 459bf5812d Add missing AVX512 operators for MSVC (#948) Kawrakow 2025-11-14 06:58:51 +02:00
  • 6b9d1bf4b4 Graph reuse (#947) Kawrakow 2025-11-14 06:58:19 +02:00
  • 41bde27541 Graph reuse (#947) Kawrakow 2025-11-14 06:58:19 +02:00
  • 03f43ee612 Merge branch 'main' into ik/graph_reuse ik/graph_reuse Kawrakow 2025-11-13 19:27:30 +02:00
  • 22c20fcd6d Fix flash attention long argument for mainloine compatibility Kawrakow 2025-11-13 19:22:16 +02:00
  • be1a8cb9d8 Fix flash attention long argument for mainloine compatibility Iwan Kawrakow 2025-11-13 19:22:16 +02:00
  • 9519a834ac Fix fused up+gate when mmq is not supported ik/fix_up_gate_mmq_not_supported Iwan Kawrakow 2025-11-13 18:43:21 +02:00
  • ce3ce97a29 Fix repacked legacy quants (#951) Kawrakow 2025-11-13 15:35:37 +02:00
  • f4202c812e Fix repacked legacy quants (#951) Kawrakow 2025-11-13 15:35:37 +02:00
  • d296ab79be Fix q4_0_r8 also on Zen4 ik/fix_repacked_legacy_quants Iwan Kawrakow 2025-11-13 12:52:25 +02:00
  • 2396c41ef8 Fix q5_0_r4 and q6_0_r4 also on Zen4 Iwan Kawrakow 2025-11-13 12:49:34 +02:00
  • 64126305da Fix q4_0_r8 Iwan Kawrakow 2025-11-13 12:12:53 +02:00
  • 742328bfcd Fix q6_0_r4 Iwan Kawrakow 2025-11-13 11:49:18 +02:00
  • c0f6116c49 Fix q5_0_r4 Iwan Kawrakow 2025-11-13 11:43:52 +02:00
  • 88c02fa108 Set default MLA to 3 also in llama-bench (#949) Kawrakow 2025-11-13 09:52:06 +02:00
  • 38abd0e289 Set default MLA to 3 also in llama-bench (#949) Kawrakow 2025-11-13 09:52:06 +02:00
  • aba78ceafa Set default MLA to 3 also in llama-bench ik/llama_bench_mla3 Iwan Kawrakow 2025-11-13 09:50:23 +02:00
  • e1d669fb34 Add missing AVX512 operators for MSVC ik/fix_windows_avx512 Iwan Kawrakow 2025-11-13 09:09:58 +02:00
  • 874926800f Add mainline compatible FA command line option (#944) Kawrakow 2025-11-13 08:55:33 +02:00
  • bbc127d10e Add mainline compatible FA command line option (#944) Kawrakow 2025-11-13 08:55:33 +02:00
  • 32edcb4b74 Fix rope_norm_fast_cuda (#945) Kawrakow 2025-11-13 08:54:37 +02:00
  • 5f73351638 Fix rope_norm_fast_cuda (#945) Kawrakow 2025-11-13 08:54:37 +02:00
  • 6d799ea36b Also fix mrope and vision ik/fix_rope_norm_fast_cuda Iwan Kawrakow 2025-11-13 08:48:38 +02:00
  • 1518f9b802 Change the command line option to -gr Iwan Kawrakow 2025-11-13 07:33:24 +02:00
  • a4c29905d0 This is perhaps cleaner Iwan Kawrakow 2025-11-13 06:36:33 +02:00
  • a9671fe368 This seems to work Iwan Kawrakow 2025-11-13 06:23:10 +02:00
  • d18523c8e9 One more Iwan Kawrakow 2025-11-12 17:06:50 +02:00
  • 2aee6a0d94 Fix rope_norm_fast_cuda Iwan Kawrakow 2025-11-12 17:00:13 +02:00
  • 59ee8d7823 WIP Iwan Kawrakow 2025-11-12 16:54:09 +02:00
  • ac409b4c7f Graph reuse: add command line argument to turn it on Iwan Kawrakow 2025-11-12 14:52:13 +02:00
  • 14e06e26a5 Add mainline compatible FA command line option ik/fa_mainline_compat Iwan Kawrakow 2025-11-12 11:12:34 +02:00
  • ddc88bac17 Set mla=3 by default (#943) Kawrakow 2025-11-12 11:00:58 +02:00
  • 8a8de91a42 Set mla=3 by default (#943) Kawrakow 2025-11-12 11:00:58 +02:00
  • e9e9fc3dfd Set mla=3 by default ik/mla=3_by_default Iwan Kawrakow 2025-11-12 10:55:41 +02:00
  • b73d66f76e Formatting ik/deepseek_merge_qk Iwan Kawrakow 2025-11-11 19:09:04 +02:00
  • 0576d42183 Merge Q and K for DeepSeek Iwan Kawrakow 2025-11-11 18:40:29 +02:00
  • 0d97b9c0bf Enable fusion by default (#939) Kawrakow 2025-11-11 10:35:48 +02:00
  • 9ecfee6c03 Enable fusion by default (#939) Kawrakow 2025-11-11 10:35:48 +02:00
  • 82780dfd55 Enable fusion by default ik/enable_fusion_by_default Iwan Kawrakow 2025-11-11 10:26:13 +02:00
  • 219fe93973 Opt from #880 also for iqk cuda gemv (#938) Kawrakow 2025-11-11 10:01:34 +02:00
  • 463c694676 Opt from #880 also for iqk cuda gemv (#938) Kawrakow 2025-11-11 10:01:34 +02:00
  • 5266eeea18 Opt from #880 also for iqk cuda gemv ik/iqk_mmvq_opt Iwan Kawrakow 2025-11-11 09:59:56 +02:00
  • 25cd985c9b Add --n-cpu-moe to llama_bench (#937) Kawrakow 2025-11-11 08:44:59 +02:00
  • 5e7f6711e4 Add --n-cpu-moe to llama_bench (#937) Kawrakow 2025-11-11 08:44:59 +02:00
  • 7854e64231 Add usage ik/llama_bench_n_cpu_moe Iwan Kawrakow 2025-11-11 08:41:30 +02:00
  • 54a51ff307 Add --n-cpu-moe to llama_banch Iwan Kawrakow 2025-11-11 08:39:17 +02:00
  • 121ed91165 Add rcache to llama-bench (#936) Kawrakow 2025-11-11 08:06:18 +02:00
  • 1e6f8ff89a Add rcache to llama-bench (#936) Kawrakow 2025-11-11 08:06:18 +02:00
  • febf3df389 Add rcache to llama-bench ik/llama_bench_rcache Iwan Kawrakow 2025-11-10 17:48:05 +02:00
  • 1223bc63b8 Minor: remove unnecesssary calls to build_inp_out_ids (#935) Kawrakow 2025-11-10 17:38:46 +02:00
  • 489554bc11 Minor: remove unnecesssary calls to build_inp_out_ids (#935) Kawrakow 2025-11-10 17:38:46 +02:00
  • 1cebf75ba4 Minor: remove unnecesssary calls to build_inp_out_ids ik/remove_unnecessary_calls Iwan Kawrakow 2025-11-10 17:37:33 +02:00
  • 263be6670b Add support for SmolLM3 (#934) Kawrakow 2025-11-10 15:40:12 +02:00
  • e4145c013f Add support for SmolLM3 (#934) Kawrakow 2025-11-10 15:40:12 +02:00
  • 5d90f711d4 Model loading and compute graph ik/smollm3 Iwan Kawrakow 2025-11-10 11:27:18 +02:00
  • 2309a97342 Convert from HF Iwan Kawrakow 2025-11-10 10:31:24 +02:00
  • 86e2bec04e DeepSeek FA optimizations (#929) Kawrakow 2025-11-10 09:55:30 +02:00
  • a313b71bf8 DeepSeek FA optimizations (#929) Kawrakow 2025-11-10 09:55:30 +02:00
  • 2bb57b4900 This seems better ik/deepseek_fa_opt Iwan Kawrakow 2025-11-09 10:57:52 +02:00
  • b5f0a2b617 Use new-new-mma also for MLA=3, and use mask bounds Iwan Kawrakow 2025-11-09 09:53:09 +02:00
  • adba641347 DeepSeek TG optimizations for TG (#928) Kawrakow 2025-11-10 09:52:07 +02:00
  • 7747000f3b DeepSeek TG optimizations for TG (#928) Kawrakow 2025-11-10 09:52:07 +02:00
  • eea6cc4433 Server: Add --draft-params to set draft model parameter via command line args (#932) firecoperana 2025-11-10 07:51:07 +00:00
  • 9dfbc69aee Server: Add --draft-params to set draft model parameter via command line args (#932) firecoperana 2025-11-10 07:51:07 +00:00
  • bf474e9bff Use fused gemv+add only for TG (#933) Kawrakow 2025-11-10 08:34:24 +02:00
  • ad688e10f4 Use fused gemv+add only for TG (#933) Kawrakow 2025-11-10 08:34:24 +02:00
  • ef64b1a171 Use fused gemv+add only for TG ik/fuse_bias_only_tg Iwan Kawrakow 2025-11-10 07:43:40 +02:00
  • 56ee303254 Make biased gemv fusion optional (#931) Kawrakow 2025-11-09 19:09:47 +02:00
  • db3bed2461 Make biased gemv fusion optional (#931) Kawrakow 2025-11-09 19:09:47 +02:00
  • 19ecaaad42 Remove forgotten printf ik/make_biased_gemv_optional Iwan Kawrakow 2025-11-09 18:27:59 +02:00
  • bcdc12d02b Fix one path through gemv fusion Iwan Kawrakow 2025-11-09 18:25:15 +02:00
  • 04ee681656 Make biased gemv fusion optional Iwan Kawrakow 2025-11-09 17:36:58 +02:00
  • 7df9947923 Fix compiler warning Kawrakow 2025-11-09 14:35:59 +02:00
  • 0db683e478 Fix compiler warning Iwan Kawrakow 2025-11-09 14:35:59 +02:00
  • fd37776584 Add ARM Grace Blackwell (NVIDIA DGX Spark) support (#922) Lennart Lopin 2025-11-09 07:22:40 -05:00
  • 1da9c218b0 Add ARM Grace Blackwell (NVIDIA DGX Spark) support (#922) Lennart Lopin 2025-11-09 07:22:40 -05:00
  • 73c28dbef4 server: bug fix for preserved_tokens not preserved in process_token (#926) firecoperana 2025-11-09 12:16:29 +00:00
  • ff4c1c6eb3 server: bug fix for preserved_tokens not preserved in process_token (#926) firecoperana 2025-11-09 12:16:29 +00:00
  • b63309a918 Fix embedding missing, CORS and crash using verbose in server (#924) firecoperana 2025-11-09 12:16:03 +00:00
  • 03235a4b11 Fix embedding missing, CORS and crash using verbose in server (#924) firecoperana 2025-11-09 12:16:03 +00:00
  • 5cc15d0ecf CUDA MoE improvements (#923) Kawrakow 2025-11-09 11:34:33 +02:00
  • 9207a48ab2 CUDA MoE improvements (#923) Kawrakow 2025-11-09 11:34:33 +02:00
  • aae817e50b DeepSeek TG optimizations for TG ik/deepseek_opt Iwan Kawrakow 2025-11-09 07:54:05 +02:00
  • defa6945b3 CUDA: fuse copies to K and V cache (#921) Kawrakow 2025-11-08 18:13:58 +02:00
  • e5fc02c71a CUDA: fuse copies to K and V cache (#921) Kawrakow 2025-11-08 18:13:58 +02:00