Commit Graph

  • b837773728 Port speculative decoding from upstream to llama-server (#645) g2mt 2025-08-15 21:26:44 -07:00
  • 2e2abddaa8 Quick hack to improve TG performance for SWA models (#692) Kawrakow 2025-08-15 16:43:04 +03:00
  • 4239d259a6 Quick hack to improve TG performance for SWA models (#692) Kawrakow 2025-08-15 16:43:04 +03:00
  • 1a102919c4 Quick hack to improve TG performance for SWA models ik/cpu_swa_v0 Iwan Kawrakow 2025-08-15 14:23:33 +03:00
  • 5eaa068e48 Merge branch 'main' into s6/mikupad Saood Karim 2025-08-15 03:13:05 -05:00
  • 633e0617b0 Enable CUDA graphs for MoE models + GPT-OSS support (#689) Kawrakow 2025-08-15 09:18:07 +03:00
  • fc06bc9d27 Enable CUDA graphs for MoE models + GPT-OSS support (#689) Kawrakow 2025-08-15 09:18:07 +03:00
  • 3d5141a44f Turn graphs on by default ik/try_cuda_graphs Iwan Kawrakow 2025-08-15 09:15:48 +03:00
  • 6873fbcbdf Minor Iwan Kawrakow 2025-08-15 08:32:35 +03:00
  • 54084f28c0 Disable graphs without -fmoe Iwan Kawrakow 2025-08-15 08:05:22 +03:00
  • 2767ae6934 Fix visual bug Saood Karim 2025-08-14 23:44:52 -05:00
  • 8a83e1f083 cuda: re-add q8_0 -> q8_0 transpose Iwan Kawrakow 2025-08-14 19:33:58 +03:00
  • 1faa0868e7 Iterating on Windows build failures Iwan Kawrakow 2025-08-14 16:06:10 +03:00
  • be638d7c80 Iterating on Windows build failures ik/gpt-oss Iwan Kawrakow 2025-08-14 16:06:10 +03:00
  • f6a48fa644 Fix llama_mmap on non-Linux platforms Iwan Kawrakow 2025-08-14 15:35:06 +03:00
  • d249d958ef Adding forgotten file Iwan Kawrakow 2025-08-14 15:40:44 +03:00
  • 14ec5c7fd7 Fix llama_mmap on non-Linux platforms Iwan Kawrakow 2025-08-14 15:35:06 +03:00
  • ca703ed677 cuda: cpy for q6_0 Iwan Kawrakow 2025-08-14 14:07:07 +03:00
  • 7bdaae8861 Make q8_0 cache work for DeepSeek models with CUDA graphs Iwan Kawrakow 2025-08-14 13:44:00 +03:00
  • 486dc0b3ee CUDA graphs - seems to be working Iwan Kawrakow 2025-08-14 12:00:42 +03:00
  • 2791e14753 CUDA graphs WIP - still not working Iwan Kawrakow 2025-08-14 11:18:07 +03:00
  • 9b16add2d4 Attempt to use CUDA graphs with MoE models - not working Iwan Kawrakow 2025-08-14 10:43:14 +03:00
  • 0fe47c57eb Fix CUDA after latest changes Iwan Kawrakow 2025-08-13 17:23:13 +03:00
  • d2985c6a64 Minor Iwan Kawrakow 2025-08-13 16:01:47 +03:00
  • 81e97b81be Fix llama_mmap so mmap works Iwan Kawrakow 2025-08-13 15:54:32 +03:00
  • bf28e515a3 It runs, but mmap does not work Iwan Kawrakow 2025-08-13 15:24:41 +03:00
  • 949b686412 Builds successfully Iwan Kawrakow 2025-08-13 14:31:48 +03:00
  • c00335684c Gracefully fail the decode instead of crashing for kshift Deepseek error (#688) saood06 2025-08-13 05:12:40 -05:00
  • e082df47f2 Gracefully fail the decode instead of crashing for kshift Deepseek error (#688) saood06 2025-08-13 05:12:40 -05:00
  • b951930e31 minor s6/fix_kshift_crash Saood Karim 2025-08-13 05:09:16 -05:00
  • a5a274a007 fix formatting Saood Karim 2025-08-13 05:05:41 -05:00
  • 781df37e16 Gracefuly fail the decode instead of crashing for kshift Deepseek error) Saood Karim 2025-08-13 04:43:11 -05:00
  • 6b6d25bfbf llama: factor out model loader Iwan Kawrakow 2025-08-13 12:12:46 +03:00
  • 2ac615507f Simdify swiglu_oai Iwan Kawrakow 2025-08-12 19:40:35 +03:00
  • 8bd983300c gpt_oss: Implement -fmoe on the CPU Iwan Kawrakow 2025-08-12 18:44:40 +03:00
  • aa5a187a44 Add sinks to iqk flash attention Iwan Kawrakow 2025-08-12 15:44:48 +03:00
  • 38533d6bd4 Move row sums to the write place Iwan Kawrakow 2025-08-12 13:39:37 +03:00
  • 5abc39481b gpt-oss: add ability to use -fmoe (only CUDA for now) Iwan Kawrakow 2025-08-12 09:49:18 +03:00
  • 4a5695ef6f Finalize UI Saood Karim 2025-08-11 04:48:07 -05:00
  • 464b8fc03b CUDA: add head size of 64 to new mma Iwan Kawrakow 2025-08-11 11:10:45 +03:00
  • 21ced1e3c1 Fix completions endpoint (#684) firecoperana 2025-08-11 01:43:20 -05:00
  • d99cf7cb71 Fix completions endpoint (#684) firecoperana 2025-08-11 01:43:20 -05:00
  • be74226840 Major UI work (and also add update backend endpoints to accomadate) Saood Karim 2025-08-10 23:04:20 -05:00
  • 3cd7e5c9b4 gpt-oss: add sinks to the attn-vec kernels Iwan Kawrakow 2025-08-10 20:02:33 +03:00
  • 2b91a9d299 gpt-oss: Seems to be working on CUDA Iwan Kawrakow 2025-08-10 16:31:55 +03:00
  • 9dcdd652c7 CUDA: ADD_ID Iwan Kawrakow 2025-08-10 14:08:32 +03:00
  • 42bed31488 gpt-oss: CPU seems to be working Iwan Kawrakow 2025-08-10 13:13:05 +03:00
  • c69d04f324 gpt-oss: WIP llama Iwan Kawrakow 2025-08-10 10:09:42 +03:00
  • e24a1d3eda gpt-oss: attnetion sinks, swiglu_oai Iwan Kawrakow 2025-08-09 17:11:56 +03:00
  • 24ac2596ef gmp-oss: common Iwan Kawrakow 2025-08-09 11:10:19 +03:00
  • ff024df079 add jinja template support (#677) firecoperana 2025-08-09 07:50:30 -05:00
  • d60c8f4d3b add jinja template support (#677) firecoperana 2025-08-09 07:50:30 -05:00
  • e23b2a7cc9 MXFP4 (#682) Kawrakow 2025-08-09 08:40:18 +03:00
  • 7117c23de4 MXFP4 (#682) Kawrakow 2025-08-09 08:40:18 +03:00
  • 80bdee3f85 mxfp4: minor CUDA tweaks ik/mxfp4 Iwan Kawrakow 2025-08-09 08:15:37 +03:00
  • 34bb912db1 mxfp4: CUDA MMQ Iwan Kawrakow 2025-08-08 20:17:10 +03:00
  • c0449207cf mxfp4: CUDA GEMV Iwan Kawrakow 2025-08-08 19:57:20 +03:00
  • 3466dbda40 maxfp4: CUDA dequantize Iwan Kawrakow 2025-08-08 19:20:39 +03:00
  • fd8384e3aa Port cpu moe options from mainline (#672) Parsa 2025-08-08 04:38:18 -07:00
  • 2fd28e97c7 Fix for Deepseek r1 parsing (#676) Anton Sokolchenko 2025-08-08 12:56:44 +02:00
  • 2cf2fc2a2f Fix quantized K cache without FA (#680) Kawrakow 2025-08-08 13:51:14 +03:00
  • 7388d9be8d mxfp4: Metal Iwan Kawrakow 2025-08-08 18:17:34 +03:00
  • 19d6799652 mxfp4: repacked GEMM (NEON) Iwan Kawrakow 2025-08-08 17:37:40 +03:00
  • 679ca66a31 mxfp4: NEON GEMM Iwan Kawrakow 2025-08-08 17:31:36 +03:00
  • 9238445f19 mxfp4: AVX2 GEMM Iwan Kawrakow 2025-08-08 16:27:12 +03:00
  • 6bda22a4d6 Port cpu moe options from mainline (#672) Parsa 2025-08-08 04:38:18 -07:00
  • 293f4aa433 Port cpu moe options from mainline (#672) Parsa 2025-08-08 04:38:18 -07:00
  • dc1746338c Fix for Deepseek r1 parsing (#676) Anton Sokolchenko 2025-08-08 12:56:44 +02:00
  • fa7a0f340e Fix for Deepseek r1 parsing (#676) Anton Sokolchenko 2025-08-08 12:56:44 +02:00
  • d95ac93027 Fix quantized K cache without FA (#680) Kawrakow 2025-08-08 13:51:14 +03:00
  • 41f0d2e5de Fix quantized K cache without FA (#680) Kawrakow 2025-08-08 13:51:14 +03:00
  • f747cbca08 Fix MMQ when running with quantized K cache without FA ik/fix_quantized_kv_nofa Iwan Kawrakow 2025-08-08 13:42:03 +03:00
  • 0dce7f9128 Prevent assert with quantized K cache and no FA Iwan Kawrakow 2025-08-08 11:18:26 +03:00
  • a5e87adfa7 mxfp4: repacked GEMM (AVX2/Zen4) Iwan Kawrakow 2025-08-08 10:58:11 +03:00
  • 294341a3d2 mxfp4: Zen4 GEMM Iwan Kawrakow 2025-08-08 09:23:02 +03:00
  • 58c3bffff4 mxfp4: basics Iwan Kawrakow 2025-08-08 08:41:04 +03:00
  • ffd211849b Vulkan: add cmake options to build without coopmat(2) support (#674) Kawrakow 2025-08-07 17:26:21 +03:00
  • 58f3bda0ae Vulkan: add cmake options to build without coopmat(2) support (#674) Kawrakow 2025-08-07 17:26:21 +03:00
  • 012feab4b1 Vulkan: add cmake options to build without coopmat(2) support ik/vulkan1 Iwan Kawrakow 2025-08-07 14:09:15 +03:00
  • 05a61510b9 Fix Qwen3 content extraction breaking code formatting (#661) Anton Sokolchenko 2025-08-07 07:22:01 +02:00
  • dee40cffb6 Fix Qwen3 content extraction breaking code formatting (#661) Anton Sokolchenko 2025-08-07 07:22:01 +02:00
  • f4051d9c3e Deepseek R1 function calls (more formats) (#652) Anton Sokolchenko 2025-08-07 07:15:57 +02:00
  • e484944bc0 Deepseek R1 function calls (more formats) (#652) Anton Sokolchenko 2025-08-07 07:15:57 +02:00
  • d65d5fe29e Add support for GLM-4.5 models (#668) Thireus ☠ 2025-08-07 05:55:00 +01:00
  • 47c3dc798c Add support for GLM-4.5 models (#668) Thireus ☠ 2025-08-07 05:55:00 +01:00
  • ddceb0a55d Merge pull request #648 from ikawrakow/fcp/missing_token_ps firecoperana 2025-07-26 21:13:52 -05:00
  • ae0ba31fd0 Merge pull request #648 from ikawrakow/fcp/missing_token_ps firecoperana 2025-07-26 21:13:52 -05:00
  • 33daaf7310 Fix text generation endpoint (#654) Anton Sokolchenko 2025-07-27 02:36:48 +02:00
  • d65c8cea5a Fix text generation endpoint (#654) Anton Sokolchenko 2025-07-27 02:36:48 +02:00
  • f443040d49 webui: move preset settings to top firecoperana 2025-07-20 07:46:36 -05:00
  • 608ff76170 webui: move preset settings to top firecoperana 2025-07-20 07:46:36 -05:00
  • 981259fb8b bug fix no timings after tool update firecoperana 2025-07-24 13:49:11 -05:00
  • 82d9bc03a9 bug fix no timings after tool update firecoperana 2025-07-24 13:49:11 -05:00
  • cfc8f5a61b Enable LLM function calls (#643) Anton Sokolchenko 2025-07-24 20:24:12 +02:00
  • 4e9c78c039 Enable LLM function calls (#643) Anton Sokolchenko 2025-07-24 20:24:12 +02:00
  • dffa0a95b3 IQ4_KSS improvements (#642) Kawrakow 2025-07-23 20:50:57 +02:00
  • 1b05210904 IQ4_KSS improvements (#642) Kawrakow 2025-07-23 20:50:57 +02:00
  • 0486b5ad93 Update README.md Kawrakow 2025-07-23 19:38:54 +02:00
  • 9defcebecc Update README.md Kawrakow 2025-07-23 19:38:54 +02:00
  • d78df741ce Update AUTHORS Kawrakow 2025-07-23 18:14:51 +02:00