Commit Graph

  • 45cd1a70f5 Fix llama-bench mla parameter (#1016) Kawrakow 2025-11-27 09:33:30 +01:00
  • 0a6e650e29 Fix llama-bench mla parameter ik/fix_1015 Iwan Kawrakow 2025-11-27 10:30:58 +02:00
  • 8c39ff966d Change default RPC order and fix wrong RPC server order in --device arg (#1011) firecoperana 2025-11-26 09:51:51 -06:00
  • 5f3485c2c2 Change default RPC order and fix wrong RPC server order in --device arg (#1011) firecoperana 2025-11-26 09:51:51 -06:00
  • 0b126b2ca6 Fix prompt tokenization issue during prompt processing (#1008) firecoperana 2025-11-26 03:34:26 -06:00
  • 314300aa9f Fix prompt tokenization issue during prompt processing (#1008) firecoperana 2025-11-26 03:34:26 -06:00
  • 2339d41d2e Change default RPC order and fix wrong RPC order in --device arg fcp/fix_rpc_device firecoperana 2025-11-24 23:00:25 -06:00
  • 9337229274 Add MXFP4 to gguf-py constants (#1007) Kawrakow 2025-11-24 15:43:49 +01:00
  • 36c26701f6 Add MXFP4 to gguf-py constants (#1007) Kawrakow 2025-11-24 15:43:49 +01:00
  • 43f9f342dd Add MXFP4 to gguf-py constants ik/gguf_py_add_maxfp4 Iwan Kawrakow 2025-11-24 16:40:17 +02:00
  • a3b8efd687 Enable iq4_nl KV cache on CUDA (#1006) Kawrakow 2025-11-24 09:41:19 +01:00
  • ed12ca5517 Enable iq4_nl KV cache on CUDA (#1006) Kawrakow 2025-11-24 09:41:19 +01:00
  • 422585d726 Enable iq4_nl KV cache on CUDA ik/iq4_nl_cache Iwan Kawrakow 2025-11-24 10:39:14 +02:00
  • 0243356650 Fix q6_0 dequantize (#1005) Kawrakow 2025-11-24 09:13:46 +01:00
  • adabcd9c33 Fix q6_0 dequantize (#1005) Kawrakow 2025-11-24 09:13:46 +01:00
  • 8297d10111 Fix q6_0 dequantize ik/fix_q6_0_dequantize Iwan Kawrakow 2025-11-24 10:04:46 +02:00
  • 9a63e768ea Legacy quants cpy_blck_q_f16 function for K cache (#1001) Nexes the Elder 2025-11-24 08:56:38 +01:00
  • c886525963 Legacy quants cpy_blck_q_f16 function for K cache (#1001) Nexes the Elder 2025-11-24 08:56:38 +01:00
  • ada5a92241 Disable RoPE cache (#1004) Kawrakow 2025-11-24 07:09:46 +01:00
  • 8a2bbbe919 Disable RoPE cache (#1004) Kawrakow 2025-11-24 07:09:46 +01:00
  • 99e0e334a5 Disable RoPE cache ik/disable_rope_cache Iwan Kawrakow 2025-11-24 08:08:07 +02:00
  • 07d08e15ad webui update (#1003) firecoperana 2025-11-24 00:03:45 -06:00
  • a68d5802ae webui update (#1003) firecoperana 2025-11-24 00:03:45 -06:00
  • 920f424929 Support GigaChat3 (#995) Kawrakow 2025-11-24 06:55:14 +01:00
  • f1191036b2 Support GigaChat3 (#995) Kawrakow 2025-11-24 06:55:14 +01:00
  • f6163dd58f Fix: Register missing /apply-template endpoint (#999) gapeleon 2025-11-24 16:53:15 +11:00
  • 1feccd4174 Fix: Register missing /apply-template endpoint (#999) gapeleon 2025-11-24 16:53:15 +11:00
  • 7505165dee Fix truncated logprobs when streaming is off (#998) Yap Sok Ann 2025-11-24 12:52:15 +07:00
  • de3f330273 Fix truncated logprobs when streaming is off (#998) Yap Sok Ann 2025-11-24 12:52:15 +07:00
  • 15695a0617 fix kimi-k2 tool call (#996) hksdpc255 2025-11-24 16:51:16 +11:00
  • 80b79f365c fix kimi-k2 tool call (#996) hksdpc255 2025-11-24 16:51:16 +11:00
  • 0369d2ba44 Gigachat: CPU FA (needs 192 x 192 for MLA = 3) ik/support_gigachat Iwan Kawrakow 2025-11-21 11:44:34 +02:00
  • 360c8c6fd4 Gigachat: CUDA FA (needs 192 x 192 for MLA = 3) Iwan Kawrakow 2025-11-21 11:27:10 +02:00
  • e0bc495792 Fixing Gigachat support Iwan Kawrakow 2025-11-21 10:59:35 +02:00
  • 2e4bfed583 WIP: try syncing - not working yet ik/wip_sync_llama Kawrakow 2025-11-20 13:30:43 +00:00
  • 912c98f60b Fix requatizing from row-interleaved quants (#992) Kawrakow 2025-11-20 11:50:09 +01:00
  • bf12f502a4 Fix requatizing from row-interleaved quants (#992) Kawrakow 2025-11-20 11:50:09 +01:00
  • b9d25dc35b Fix requatizing from row-interleaved quants ik/fix_requantize_interleaved Kawrakow 2025-11-20 10:44:40 +00:00
  • e919c00cc9 Make gguf-py stuff work with numpy 2.0 (#991) Kawrakow 2025-11-20 10:20:55 +01:00
  • 60227f4433 Make gguf-py stuff work with numpy 2.0 (#991) Kawrakow 2025-11-20 10:20:55 +01:00
  • 8f7dd2f06b Make gguf-py stuff work with numpy 2.0 ik/gguf_py_changes_for_np2.0 Kawrakow 2025-11-20 09:11:01 +00:00
  • 187e37bad8 Fix Kimi2 parsing issues (#989) Kawrakow 2025-11-20 10:08:02 +01:00
  • 1128a55b0a Fix Kimi2 parsing issues (#989) Kawrakow 2025-11-20 10:08:02 +01:00
  • 4b731fe333 Fix junja -> junja ik/fix_kimi2_parse Kawrakow 2025-11-20 08:01:21 +00:00
  • 64a094c38e Add @hksdpc255's junja templates Kawrakow 2025-11-20 06:26:25 +00:00
  • d814e29a1c Fix Kimi2 chat parse Kawrakow 2025-11-20 06:13:27 +00:00
  • 1321645149 Disable split mode "row" (#987) Kawrakow 2025-11-19 16:15:50 +01:00
  • 0f6986a33c Disable split mode "row" (#987) Kawrakow 2025-11-19 16:15:50 +01:00
  • 00259c14a7 Also llama-bench ik/disable_sm_row Kawrakow 2025-11-19 15:14:52 +00:00
  • c82c004d9d Disable split mode "row" Kawrakow 2025-11-19 15:08:23 +00:00
  • 2cbfd04d88 Server: Handle context shift better to reduce prompt processing time (#973) firecoperana 2025-11-19 15:04:48 +00:00
  • bacb8fb79f Server: Handle context shift better to reduce prompt processing time (#973) firecoperana 2025-11-19 15:04:48 +00:00
  • af10490331 Attempt to fix #974 (#983) Kawrakow 2025-11-19 15:48:39 +01:00
  • 232050b473 Attempt to fix #974 (#983) Kawrakow 2025-11-19 15:48:39 +01:00
  • 810c47fc38 Attempt to fix #974 ik/try_fix_974 Kawrakow 2025-11-19 07:59:46 +00:00
  • 5e525cd6de Fuse sum_rows and div with topk-moe (#984) Kawrakow 2025-11-19 13:44:09 +01:00
  • d764edd652 Fuse sum_rows and div with topk-moe (#984) Kawrakow 2025-11-19 13:44:09 +01:00
  • d82543a059 Make sure we can fuse Q and K RoPE for DeepSeek models (#985) Kawrakow 2025-11-19 13:43:08 +01:00
  • 047a519771 Make sure we can fuse Q and K RoPE for DeepSeek models (#985) Kawrakow 2025-11-19 13:43:08 +01:00
  • c1d0738a1b Make sure we can fuse Q and K RoPE for DeepSeek models ik/deepseek_guarantee_rope_fusion Kawrakow 2025-11-19 12:39:34 +00:00
  • f514891418 Fuse sum_rows and div with topk-moe ik/topk_moe_with_norm Kawrakow 2025-11-19 10:14:33 +00:00
  • 2fb54d232f Fuse Q and K RoPE (#980) Kawrakow 2025-11-19 09:08:42 +01:00
  • 054c31cf8f Fuse Q and K RoPE (#980) Kawrakow 2025-11-19 09:08:42 +01:00
  • da5de88073 common: Generalized XML-style tool-call parsing with streaming support (#958) hksdpc255 2025-11-19 01:29:58 +11:00
  • 2ebd715fa0 common: Generalized XML-style tool-call parsing with streaming support (#958) hksdpc255 2025-11-19 01:29:58 +11:00
  • 5195e38d47 Fuse Q and K RoPE ik/fused_rope_rope Kawrakow 2025-11-18 12:05:15 +00:00
  • eb3ce7549b Minor Iwan Kawrakow 2025-11-18 08:55:36 +00:00
  • 0157f78061 Minor Kawrakow 2025-11-18 08:55:36 +00:00
  • 25b9ac296f Add usage for -vq, --validate-quants (#977) Kawrakow 2025-11-17 16:02:14 +01:00
  • 412e4f6e23 Add usage for -vq, --validate-quants (#977) Kawrakow 2025-11-17 16:02:14 +01:00
  • a1c32c1d39 Add usage for -vq, --validate-quants ik/add_vq_help Iwan Kawrakow 2025-11-17 16:58:09 +02:00
  • 415015f386 Handle context shift better to reduce pp fcp/context_shift_fix firecoperana 2025-11-16 07:54:56 -06:00
  • d72206dd79 Add mqkv and rcache for Gemma3 (#972) Kawrakow 2025-11-16 19:10:41 +02:00
  • 294aec2bc2 Add mqkv and rcache for Gemma3 (#972) Kawrakow 2025-11-16 19:10:41 +02:00
  • 2affe8730a Add mqkv and rcache for Gemma3 ik/gemma3_mqkv_rcache Iwan Kawrakow 2025-11-16 17:38:59 +02:00
  • dffb45d44a Fix rtr when mqkv is enabled (#971) Kawrakow 2025-11-16 16:51:45 +02:00
  • a8b3c8ae73 Fix rtr when mqkv is enabled (#971) Kawrakow 2025-11-16 16:51:45 +02:00
  • a63a3492d3 Fix rtr when mqkv is enabled ik/fix_rtr_mqkv Kawrakow 2025-11-16 14:42:31 +00:00
  • 17d618a6dd Add ability to use RoPE cache to DeepSeek models (#970) Kawrakow 2025-11-16 16:50:02 +02:00
  • eafa77c412 Add ability to use RoPE cache to DeepSeek models (#970) Kawrakow 2025-11-16 16:50:02 +02:00
  • 8e2661afc8 Add ability to use RoPE cache to DeepSeek models ik/deepseek_rope_cache Kawrakow 2025-11-16 13:32:12 +00:00
  • 3008fdf0b6 Allow distinct output tensor for Gemma models (#969) Kawrakow 2025-11-16 12:12:41 +02:00
  • 4d003e29ee Allow distinct output tensor for Gemma models (#969) Kawrakow 2025-11-16 12:12:41 +02:00
  • 4e2f4b739d Allow distinct output tensor for Gemma models ik/gemma_output_tensor Iwan Kawrakow 2025-11-16 12:08:19 +02:00
  • 03da76eb05 Fix RoPE cache on multi-GPU setup (#966) Kawrakow 2025-11-16 11:50:48 +02:00
  • 0388a72d5d Fix RoPE cache on multi-GPU setup (#966) Kawrakow 2025-11-16 11:50:48 +02:00
  • 37d72f9878 Fix ggml_cuda_fattn_is_supported (#968) Kawrakow 2025-11-16 11:50:29 +02:00
  • 085a4d0cb9 Fix ggml_cuda_fattn_is_supported (#968) Kawrakow 2025-11-16 11:50:29 +02:00
  • 6bbc2c42ba Fix ggml_cuda_fattn_is_supported ik/fix_fattn_supported Iwan Kawrakow 2025-11-16 11:46:30 +02:00
  • 3502b9793b Fix RoPE cache on multi-GPU setup ik/really_fix_rope_cache Kawrakow 2025-11-16 06:01:50 +00:00
  • b40d11b22d Fix kv cache save and load for GLM model (#965) firecoperana 2025-11-15 15:04:16 +00:00
  • fcb0b472dd Fix kv cache save and load for GLM model (#965) firecoperana 2025-11-15 15:04:16 +00:00
  • 5ec0def0ef Fix compiler warnings (#963) firecoperana 2025-11-15 05:07:15 +00:00
  • f3db96e539 Fix compiler warnings (#963) firecoperana 2025-11-15 05:07:15 +00:00
  • bb358223cd server: cache prompt to host memory (#954) firecoperana 2025-11-14 16:40:13 +00:00
  • 0cb6dcc8c8 server: cache prompt to host memory (#954) firecoperana 2025-11-14 16:40:13 +00:00
  • 00dffb5e68 Add --chat-template-file to usage (#959) Kawrakow 2025-11-14 11:08:58 +02:00
  • 2642f48921 Add --chat-template-file to usage (#959) Kawrakow 2025-11-14 11:08:58 +02:00
  • ca03d07bb6 Add --chat-template-file to usage ik/add_jinja_file_help Iwan Kawrakow 2025-11-14 11:07:32 +02:00
  • 668c37d4cf DeepSeek: enable option to merge Q and K tensors (#941) Kawrakow 2025-11-14 08:23:04 +02:00