mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-06-05 20:54:03 +00:00
Commit Graph
Select branches
Hide Pull Requests
fcp/context_shift_fix
fcp/cors-proxy
fcp/fix_rpc_device
fcp/webui_update2
fix-recurrent-ckpt-prealloc
ik/FlashMLA-3
ik/adapt_iq1_iq2_bn
ik/adaptive_p
ik/adaptive_p_2
ik/add_extra_output_tensor
ik/add_forgotten_multi_add
ik/add_granite
ik/add_iq3ks_to_gguf
ik/add_jinja_file_help
ik/add_missing_enum_values_qwen3
ik/add_missing_gguf_constants
ik/add_missing_mmq_iq5ks
ik/add_mmq_id
ik/add_mtmd
ik/add_q60
ik/add_vq_help
ik/allow_empty_splits
ik/andrew_trellis
ik/another_mmq_id_fix
ik/apply_cuda_faster_iq3k
ik/arch_flags
ik/arm_better_r4
ik/attn_gemm
ik/avoid_cuda_mla_1
ik/avoid_per_step_ssm_copy
ik/avoid_recurrent_state_copy
ik/avx2_bf16
ik/avx2_flash_attn
ik/avx2_flash_attn_2
ik/avx2_q4_0_q8_0
ik/avx2_q5_0
ik/avx2_r4_tweaks
ik/backend_reduce_syncs
ik/bailingmoe2
ik/bailingmoe2_graph
ik/barrier
ik/bench_gp
ik/better_batched_processing
ik/better_cpu_fa_thread_strategy
ik/better_fa_glm45
ik/better_fa_masking
ik/better_fixup_stream_k
ik/better_flash_mla
ik/better_graph_pp
ik/better_graph_tg
ik/better_iq4_nl
ik/better_iqk_strategy
ik/better_model_info
ik/better_moe_small_batch
ik/better_mtp
ik/better_n_cpu_moe
ik/better_q40_kv_cache
ik/better_q40_kv_cache_cpu
ik/better_tg_fattn
ik/bf16_kv_cache
ik/bf16_r4
ik/biased_mmvq
ik/biased_qkv
ik/bitnet_adjustments
ik/bitnet_cuda
ik/bitnet_fused_unary
ik/bitnet_improve_metal
ik/bitnet_optional_scales
ik/bitnet_token_embedding_gpu
ik/bitnet_token_embedding_gpu_2
ik/bonsai_avx2
ik/bonsai_neon
ik/buffer_type_overrides
ik/bug_missing_parentheses
ik/cached_graph
ik/change_default_fa_offset
ik/change_fmoe_fa_defaults
ik/change_q_pure
ik/chat_templates
ik/check_cpu_fa_supported_types
ik/check_for_empty_mask
ik/check_up_gate_fmoe
ik/clang_warnings
ik/cleanup_fudge_factors
ik/cohere2
ik/cohere2_sm_graph
ik/convert_i2s
ik/copyright
ik/correct_glm47_flash_gating_func
ik/correct_missing_gating_func_comments
ik/cpp_17
ik/cpu_argsort
ik/cpu_deepseek_fa
ik/cpu_fa_dont_repack_tg
ik/cpu_fa_tg_glm4.5
ik/cpu_mla_all_quants
ik/cpu_moe_tg
ik/cpu_repeat
ik/cpu_swa_fa
ik/cpu_swa_v0
ik/cpu_swa_v1
ik/cpu_swa_v2
ik/cpu_topk_moe
ik/cuda_better_moe
ik/cuda_bf16
ik/cuda_ctx_mess
ik/cuda_faster_iq2k
ik/cuda_faster_iq4nl_kvcache
ik/cuda_faster_moe_tg
ik/cuda_fattn_Dk_Dv
ik/cuda_fix_quantized_flash_mla3
ik/cuda_flash_mla3
ik/cuda_flash_mla3_v2
ik/cuda_flash_mla_q8_0
ik/cuda_graphs_with_overrides
ik/cuda_grouped_topk
ik/cuda_iq1_m_r4
ik/cuda_iq1_s_r4
ik/cuda_iq2k_use_bperm1
ik/cuda_iq3k_use_bperm1
ik/cuda_iq4_k_r4
ik/cuda_iqk_ks_r4
ik/cuda_iqk_r4
ik/cuda_large_cpy
ik/cuda_lto
ik/cuda_mailine_fixes
ik/cuda_mla
ik/cuda_mla2
ik/cuda_mmq_iq2_k
ik/cuda_mmq_iq4_k
ik/cuda_mmq_iq4_ks
ik/cuda_native
ik/cuda_params
ik/cuda_q4_0_r4
ik/cuda_quantized_fmoe
ik/cuda_refactor_fattn
ik/cuda_rms_non_contiguous
ik/cuda_rope_back
ik/cuda_set_device
ik/cuda_swa2
ik/cuda_swa3
ik/cuda_topk_moe
ik/cuda_tracer
ik/cuda_use_bperm
ik/cuda_use_pinned_memory
ik/cuda_use_pinned_memory_2
ik/custom_q_rules
ik/debug_849
ik/debug_issue_721
ik/debug_issue_733
ik/dedup_stb_image
ik/deepseek_fa_opt
ik/deepseek_guarantee_rope_fusion
ik/deepseek_is_this_better
ik/deepseek_merge_qk
ik/deepseek_mla0
ik/deepseek_opt
ik/deepseek_rope_cache
ik/delta_dry
ik/delta_net
ik/delta_net_neon
ik/delta_net_tweaks
ik/dequant_gemm
ik/dequant_moe_gemm
ik/desperate_bug_fix_attempt
ik/disable_add_fused_rms
ik/disable_experimental_code1
ik/disable_fusion_by_default
ik/disable_k_shift_smgraph
ik/disable_khadamard_if_not_power2
ik/disable_multi_add
ik/disable_or_enable_p2p
ik/disable_rope_cache
ik/disable_sm_row
ik/disable_smgraph_qwen35moe_mtp
ik/disable_smgraph_recurrent
ik/disable_some_fusion
ik/disable_vocab_debug
ik/disabled_cuda_graphs
ik/disallow_speculation_for_hybrid
ik/dont_abort_on_nccl_init_failure
ik/dont_split_output
ik/dup_experts_bias
ik/enable_all_iqk_fa_quants
ik/enable_cuda_graphs_with_reduce
ik/enable_fusion_by_default
ik/enable_mla3_in_crippled_ggufs
ik/enable_smgraph_mla_hybrid
ik/ernie_graph
ik/extra_reduce_types
ik/fa_mainline_compat
ik/fa_offset_2
ik/falcon3
ik/falcon3a
ik/falcon_edge
ik/fancy_simd_log
ik/fast_sampling_avx2
ik/faster_avx2_q40
ik/faster_cpu_fused_rms
ik/faster_cpu_fused_rms1
ik/faster_imatrix
ik/faster_iq2ks_quantize
ik/faster_iq3_iq5_quantize
ik/faster_iq4k
ik/faster_iq4k_quantize
ik/faster_iq4nl_quantize
ik/faster_moe_quantize
ik/faster_per_step_restore
ik/faster_q60_avx2
ik/fattn_Dk_Dv
ik/fattn_bf16
ik/fattn_enable_iq4_nl
ik/fattn_enable_q6_0
ik/fattn_fix_overflow
ik/fattn_gqa_10
ik/fattn_is_supported
ik/fattn_kq_max_offset
ik/fattn_kqv
ik/fattn_mma
ik/fattn_q35dense
ik/fattn_work_buffer
ik/fdn_fuse_silu_cpu
ik/fit_dense_model
ik/fix_1015
ik/fix_1055
ik/fix_1205
ik/fix_1237
ik/fix_1432
ik/fix_1438
ik/fix_1478
ik/fix_300
ik/fix_358
ik/fix_412
ik/fix_447
ik/fix_499
ik/fix_538
ik/fix_596
ik/fix_827
ik/fix_Makefile
ik/fix_add_bf16_turing
ik/fix_after_883
ik/fix_again_cmake
ik/fix_annoying_warnings
ik/fix_arm_fa
ik/fix_avx2_gemm_mess
ik/fix_avx2_iq4_nl_r4
ik/fix_avx512_vs_fancy_simd
ik/fix_batched_cublas
ik/fix_bench_compile
ik/fix_bug_481
ik/fix_bug_added_in_1506
ik/fix_comma_pauses
ik/fix_compiler_warnings
ik/fix_contiguously_allocated
ik/fix_cpu_fa_bf16
ik/fix_cpu_fa_work_buffer_size
ik/fix_cuda_fa_race
ik/fix_cuda_memcpy_async
ik/fix_cuda_nans
ik/fix_cuda_scale_bug
ik/fix_debug_build
ik/fix_deepseek_fattn
ik/fix_deepseek_q80_cache
ik/fix_dequantize_when_requantizing
ik/fix_div_zero
ik/fix_dst_backend
ik/fix_dup_q
ik/fix_exp_shexp_split
ik/fix_experts_node_name
ik/fix_fa_192_128
ik/fix_fa_avx2_bug
ik/fix_fattn_odd_even
ik/fix_fattn_supported
ik/fix_flash_attn
ik/fix_fused_grouped_topk
ik/fix_gcc_arm
ik/fix_gemma3_vision
ik/fix_gemma4_hybrid
ik/fix_gemma4_quantized_KV_cache_cuda
ik/fix_gemma4_quantized_kv_cache_cpu
ik/fix_gemma_e4b
ik/fix_ggml_common
ik/fix_ggml_nbytes
ik/fix_glm4_attn
ik/fix_glm_mtp
ik/fix_glm_mtp_accept
ik/fix_glm_mtp_smgraph
ik/fix_gpt_oss_partial_offload
ik/fix_graph_parallel_partial_offload
ik/fix_hadamard_bug
ik/fix_hybrid_detection
ik/fix_hybrid_graph_muge
ik/fix_imatrix_check
ik/fix_imatrix_nonsense
ik/fix_iq4k_avx2
ik/fix_iqk_for_strange_numrows
ik/fix_jinja
ik/fix_kimi2_parse
ik/fix_kld
ik/fix_kq
ik/fix_llama4_attention
ik/fix_llama_kv_cache_cell_max
ik/fix_metal_fa
ik/fix_minimax_hadamard
ik/fix_misleading_quantize_error
ik/fix_missing_bf16_avx512
ik/fix_missing_dry
ik/fix_missing_end
ik/fix_mistral3_smgraph
ik/fix_mla1
ik/fix_mla_imatrix
ik/fix_mla_smgraph_cache_load_save
ik/fix_mmproj_bf16_cpu
ik/fix_mmq_id
ik/fix_mmq_overflow
ik/fix_mmvq_bug
ik/fix_mtp_discarding
ik/fix_mtp_no_gr
ik/fix_mtp_plus_muge
ik/fix_mul_mat_16
ik/fix_multiple_choice
ik/fix_neon_build
ik/fix_neon_legacy_quants
ik/fix_neon_q82
ik/fix_no_iqk_build
ik/fix_no_p2p_case
ik/fix_partial_ngl_smgraph
ik/fix_partial_ngl_smgraph_mla
ik/fix_partial_offload_crash
ik/fix_perf_regression
ik/fix_pr_261
ik/fix_pr_842
ik/fix_q35moe_mtp_smgraph
ik/fix_q41_q51_arm
ik/fix_q5_0_fa
ik/fix_q6_0_dequantize
ik/fix_q80_avx2_2
ik/fix_q80_avx2_mess
ik/fix_q80_moe_avx2
ik/fix_quantize_kt
ik/fix_quantized_k_cache
ik/fix_quantized_kv_nofa
ik/fix_qwen35_smgraph_hybrid
ik/fix_qwen35moe_low_mtp_acceptance
ik/fix_reduce_race
ik/fix_reduce_windows
ik/fix_repacked_legacy_quants
ik/fix_replace_all
ik/fix_requantize_interleaved
ik/fix_requantize_interleaved_2
ik/fix_ring_reduction
ik/fix_rope_norm_fast_cuda
ik/fix_rpc_off
ik/fix_rpc_off2
ik/fix_rtr_mqkv
ik/fix_ser
ik/fix_ser_cuda
ik/fix_sm_graph_with_vision
ik/fix_standard_attention_cpu
ik/fix_sync_logic
ik/fix_the_fix
ik/fix_typo
ik/fix_unknown_tensor_type
ik/fix_up_gate_mmq_not_supported
ik/fix_vulkan_required
ik/fix_windows
ik/fix_windows_avx512
ik/fix_windows_no_omp
ik/fix_xeon_6226R
ik/flash_mla
ik/flash_mla2_cuda_no_f32
ik/flash_mla2_no_f32
ik/flash_mla_2
ik/flash_mla_4
ik/flash_precision
ik/flax-vector-conversions
ik/format_name
ik/fuse_add_add_fused_rms
ik/fuse_add_fused_rms
ik/fuse_bias_only_tg
ik/fuse_biased_qkv
ik/fuse_kvcache_copy
ik/fuse_merge_up_gate_exps
ik/fuse_moe_up_gate
ik/fuse_mul_mat_scale
ik/fuse_qkv
ik/fuse_rms_rms_add
ik/fuse_ssm_silu_neon
ik/fused_bailingmoev2
ik/fused_delta_net
ik/fused_delta_net_2
ik/fused_delta_net_3
ik/fused_delta_net_3a
ik/fused_delta_net_avx512
ik/fused_ffn_up_gate
ik/fused_mul_multiadd
ik/fused_mul_unary
ik/fused_mul_unary_1
ik/fused_norm
ik/fused_rms_norm
ik/fused_rms_rms
ik/fused_rope_rope
ik/fused_softcap_softmax
ik/fused_up_gate_unary
ik/gemm_4d
ik/gemm_iq1s
ik/gemm_neon_1bit
ik/gemm_neon_iqk
ik/gemm_neon_iquants
ik/gemm_neon_kquants
ik/gemm_neon_legacy
ik/gemma3
ik/gemma3_mqkv_rcache
ik/gemma4
ik/gemma4_12B_smgraph
ik/gemma4_fuse_logits
ik/gemma4_gp_bugfix
ik/gemma4_mtmd_blindness
ik/gemma4_mtp_avoid_f32_cast
ik/gemma4_mtp_extra_output
ik/gemma4_routing
ik/gemma4_tokenizer_fixes
ik/gemma4_vision
ik/gemma_output_tensor
ik/gemma_q80_kvcache
ik/gemv_bf16_r16
ik/gguf_bool_arrays
ik/gguf_py_add_maxfp4
ik/gguf_py_changes_for_np2.0
ik/glm45_tg_fa_hack
ik/glm45_tg_very_fast
ik/glm47_fa_2
ik/glm47_tg_fa_hack
ik/glm5
ik/glm5_mtp
ik/glm_flash
ik/gpt-oss
ik/gpt_oss_graph
ik/gpu_layers
ik/gpu_layers_2
ik/gpu_layers_3
ik/graph_alloc
ik/graph_better_splits
ik/graph_parallel_tweak
ik/graph_reuse
ik/graph_reuse_field
ik/graph_reuse_on
ik/hadamard_512
ik/hadamard_block_size
ik/handle_incompatible_deepseek_ggufs
ik/handle_split_cache
ik/hide_imatrix
ik/honor_manual_splits
ik/hsums
ik/huihui_57B
ik/hunyuan_graph
ik/ignore_nextn
ik/ignore_nextn_layers
ik/imatrix_ffn_gate
ik/imatrix_fused_up_gate
ik/imatrix_lsim
ik/improve_iq1m
ik/improve_iq2_xs
ik/improve_iq2ks
ik/improve_mmq
ik/interleaved_guards
ik/iq1_kt
ik/iq1_m_neon
ik/iq1_m_r4
ik/iq1_s_checks
ik/iq1_s_gemm
ik/iq1_s_r4
ik/iq1_s_r4_k128
ik/iq1_s_r4_neon
ik/iq1_tn
ik/iq1_tn_cuda
ik/iq1_tn_metal
ik/iq1bn_metal
ik/iq1m_gemm
ik/iq2_bn_r4
ik/iq2_k
ik/iq2_k_r4
ik/iq2_k_tweak
ik/iq2_kl
ik/iq2_s_r4
ik/iq2_tn
ik/iq2_tn_as_iq2_bn
ik/iq2_tn_avx2
ik/iq2_tn_faster_pp
ik/iq2_xs_r4
ik/iq2_xxs_gemm
ik/iq2_xxs_r4
ik/iq2k_experiments
ik/iq2ks_experiments
ik/iq3_k_r4_v2
ik/iq3_ks
ik/iq3_ks_v2
ik/iq3_s_gemm
ik/iq3_s_r4
ik/iq3_s_r4_v2
ik/iq3_xxs_gemm
ik/iq3_xxs_r4
ik/iq3_xxs_r4_v2
ik/iq4_k
ik/iq4_k_r4
ik/iq4_k_r4_avx2
ik/iq4_k_tweaks
ik/iq4_k_xxs
ik/iq4_knn
ik/iq4_ks_r4
ik/iq4_kss
ik/iq4_kss_improvements
ik/iq4_nl_cache
ik/iq4_nl_x4
ik/iq4_xs_r4
ik/iq4_xs_r4_avx2
ik/iq4_xs_r8
ik/iq4_xs_r8_v2
ik/iq4kss_experiments
ik/iq4nl_kv_cache
ik/iq5_k_r4
ik/iq5_ks
ik/iq5_ks_r4
ik/iq6_k
ik/iq_gemv_tweaks
ik/iqk_fattn_all_quants
ik/iqk_gemm
ik/iqk_mmvq_opt
ik/iqk_q_improvements
ik/is_this_better_for_multi_gpu
ik/issue_214
ik/issue_217
ik/issue_224
ik/issue_230
ik/k_cache_hadamard
ik/k_cache_hadamard_cuda
ik/keep_mmap_with_no_pinned
ik/kq_fused_softmax
ik/kq_mask
ik/kq_mask_padding_64
ik/l4_rms_norm
ik/legacy_gemm
ik/limit_amb
ik/llama4
ik/llama_bench_fit
ik/llama_bench_mla3
ik/llama_bench_n_cpu_moe
ik/llama_bench_overrides
ik/llama_bench_rcache
ik/llama_bench_sas
ik/llama_bench_sm_arg
ik/llama_bench_tgb
ik/llama_hparams_add_mla
ik/llama_warnings
ik/log_probs_on_crash
ik/logging_cleanup
ik/make_biased_gemv_optional
ik/make_qx_quants
ik/mask_mt
ik/max_nodes
ik/max_nodes_again
ik/measure_barriers
ik/mellum_sm_graph
ik/merge_Aug_12_2024
ik/merge_July_26_2024
ik/merge_only_qk
ik/merge_qkv
ik/merge_up_gate_exps_2
ik/merge_up_gate_exps_3
ik/metal_bf16
ik/metal_faster_iq4ks
ik/metal_fattn_update
ik/metal_fix_iq2k
ik/metal_fix_iq3k
ik/metal_moe
ik/metal_new_trellis
ik/mimo2
ik/mimo2.5
ik/mimo2_4_gpus
ik/mimo2_graph
ik/minimax2_very_fast
ik/minimax_graph_minor
ik/ministral3
ik/minmax2_sm_graph
ik/minor_delta_tweak
ik/minor_iq2ks_tweak
ik/minor_mtp1
ik/minor_silu
ik/mistral3_large
ik/mistral3_std_attn
ik/mistral4
ik/mistral4_cpu_fa
ik/mixd_kv_cache
ik/mla
ik/mla2_q80_cache
ik/mla2_q80_cache_cpu
ik/mla=3_by_default
ik/mla_add_extra_nodes
ik/mla_fixes
ik/mla_guard
ik/mla_imatrix
ik/mla_no_transposed_cache
ik/mla_q80
ik/mla_smgraph
ik/mmq_id_thresh
ik/mmq_iq_ks_r4
ik/mmq_to_cublas
ik/mmvq_args
ik/mmvq_fuse_bias
ik/mmvq_type_supported
ik/model_fit
ik/moe_fused_unary
ik/moe_offload_strategy
ik/more_set_device
ik/mtmd_kq_type
ik/mtmd_reduce_memory_use
ik/mtp_accept_only_last_logits
ik/mtp_async_copies
ik/mtp_per_step_smgraph
ik/mtp_requantize_output
ik/mtp_reuse_graphs
ik/mtp_reuse_graphs_2
ik/mtp_tweaks1
ik/mtp_tweaks_2
ik/mul_mat_bf16
ik/mul_mat_ext
ik/multi_add
ik/mv_q4_0_r4
ik/mxfp4
ik/n_cpu_moe
ik/nccl1
ik/nccl2
ik/nccl3
ik/nccl3_async
ik/neon_bf16
ik/neon_flash_attention_2
ik/neon_flash_attention_3
ik/neon_improve_legacy_quants
ik/neon_iq3_kt
ik/new_iq1bn
ik/new_iq2kt
ik/new_iq2kt_v2
ik/new_iq4kt
ik/new_trellis_2
ik/no_KV_for_unused_layers
ik/non_contiguous_rope
ik/offline_repack
ik/offline_repack_patterns
ik/offload_policy
ik/ooae2
ik/ooae_on_by_default
ik/opt_kt_quants
ik/option_cpu_fa
ik/option_to_disable_cuda_fusion
ik/optional_yarn_log_multiplier
ik/ot_ffn_gate_up
ik/p2p_cpy_set_device
ik/per_gpu_fit_margin
ik/per_row_scale
ik/per_step_conv_states
ik/phi3.5_tweaks
ik/pickup_13095
ik/pinned_suggest
ik/play_with_barrier
ik/poc_tp
ik/poc_tp_glm4.5
ik/pre_merged_up_gate
ik/prepare_wk_b
ik/q2_k_r4
ik/q35_tweaks
ik/q3_k_r4
ik/q3next_concat
ik/q3next_concat_cpu
ik/q3next_cuda_graphs
ik/q3next_opt2
ik/q3next_opt3
ik/q4_0_r4
ik/q4_0_r8
ik/q4_k_gemm
ik/q4_k_r4
ik/q4_k_r4_v2
ik/q4_k_r4_v3
ik/q5_0_r4
ik/q5_k_r4
ik/q60_mmq
ik/q6_0_r4
ik/q6_k_gemm
ik/q6_k_r4
ik/q8_0_r4
ik/q8_KV
ik/q8_k_r16
ik/q8_k_r8
ik/q8_k_r8_avx512
ik/qkvz_tweak
ik/qkvz_tweak1
ik/qmix_tweaks
ik/qmix_tweaks_2
ik/qstats
ik/quantization_tweaks
ik/quantize_dry_run
ik/quantize_ffn_gate_inp
ik/quantize_fused_up_gate
ik/quantize_gemma4
ik/quantize_mmproj
ik/quantize_options
ik/quantize_q8k_avx2
ik/quantize_stats
ik/qwen3.5_vision
ik/qwen35_model_types
ik/qwen35_std_attn
ik/qwen35dense
ik/qwen35moe
ik/qwen35moe_muge
ik/qwen3_graph
ik/qwen3next
ik/qwen3vl_graph
ik/qwen_mtp_inp_out_ids
ik/qx_0_r4_avx2
ik/qx_k_b32_avx2
ik/r4_faster_zen4
ik/r4_neon
ik/r4_nrcy_16
ik/really_fix_rope_cache
ik/reduce_compute_buffers
ik/reduce_make_copies
ik/reduce_mla3_compute_buffer_size
ik/reduce_no_nccl
ik/reduce_race_quick_fix
ik/refactor_graphs
ik/refactor_iqk
ik/refactor_llama.cpp
ik/remove_iqk_option
ik/remove_kv_l
ik/remove_llamafile
ik/remove_scary_warning
ik/remove_unnecessary_calls
ik/remove_unnessessary_ids_copy
ik/rename_4_8
ik/rename_iq4_nl_x4
ik/reorg_mmvq_and_fuse_bias
ik/repack_also_experts
ik/repack_f16
ik/reset_1st_recurrent_graph
ik/revert_0bf4d997
ik/revert_1496
ik/revert_1687
ik/revert_739
ik/revert_delta_net_3
ik/reverts
ik/ring_reduce
ik/rm_Makefile
ik/rms_block_size
ik/rng_sampling
ik/rope_cache
ik/rtr_plus_muge
ik/run_time_repack
ik/sampling-top-n-sigma
ik/sampling-xtc
ik/sampling_refactor_sorting
ik/sampling_top_n_sigma
ik/sanitize_importance_iqk
ik/sanitize_importance_kt_quants
ik/sched_copy_experts
ik/sched_max_copies=1
ik/server_send_done
ik/set_draft_input_hidden_state
ik/shexps_better_hybrid
ik/simplify_delta_net
ik/simplify_delta_net_2
ik/skip_get_rows
ik/skip_noop_barriers
ik/skip_rowids_computation
ik/skip_unnecessary_quantize
ik/slightly_better_fdn
ik/slightly_better_graph_split_strategy
ik/sm_graph_cuda_graphs
ik/sm_graph_delta_net
ik/sm_graph_disable_cuda_graphs
ik/sm_graph_gemma4_moe
ik/sm_graph_max_gpu
ik/sm_graph_muge
ik/sm_graph_partial_offload
ik/sm_graph_pre_merged_up_gate
ik/sm_graph_q35
ik/sm_graph_q3next
ik/sm_graph_qwen35moe
ik/sm_graph_rearrange
ik/sm_graph_seedoss
ik/sm_graph_step35
ik/sm_graph_sync
ik/smart_expert_selection
ik/smollm3
ik/softcap
ik/softcap_minor
ik/split_graph_2
ik/split_mode_f32
ik/ssm_conv4_avx2
ik/ssm_conv4_silu
ik/standardize_gemma4
ik/step35
ik/step35_compat
ik/support_gigachat
ik/sweep_bench_n_predict
ik/sweep_bench_nrep
ik/sweep_bench_warmup
ik/swiglu
ik/sync_fa
ik/tensor_override_honor_mmap
ik/test_q80_NaNs
ik/test_thp
ik/tg_tweaks
ik/topk_moe_fuse_bias
ik/topk_moe_with_norm
ik/trellis_bf16
ik/trellis_metal
ik/trellis_neon
ik/trellis_opt
ik/trinet
ik/try_authors
ik/try_cuda_graphs
ik/try_fa_no_q80_repack
ik/try_fix_1014
ik/try_fix_1201
ik/try_fix_1222
ik/try_fix_367
ik/try_fix_367_v2
ik/try_fix_690
ik/try_fix_772
ik/try_fix_854
ik/try_fix_974
ik/try_fix_avx2_fa
ik/try_fix_many_gpus
ik/try_fix_many_gpus_2
ik/try_grouped_topk_playing1
ik/try_minimax_better_sm_graph
ik/try_remove_cpy_indirection
ik/try_split_mla
ik/try_split_offloaded_moe_up_gate
ik/try_svd
ik/try_trellis
ik/undo_1049_if_tensor_overrides
ik/undo_1421
ik/undo_sync_reduction
ik/update_authors
ik/update_license
ik/use_bf16_when_no_mmq
ik/use_mmq_id_for_moe
ik/use_q8_2
ik/v_cache_hadamard
ik/validate_quants_on_load
ik/vendor
ik/vulkan1
ik/vulkan_again
ik/vulkan_disable_fused_ops
ik/vulkan_disable_multi_add
ik/vulkan_fattn
ik/vulkan_fused_mul_unary
ik/vulkan_fused_rms
ik/vulkan_multi_add
ik/warn_pinned_alloc
ik/wip_sync_llama
ik/worst_graph_tokens
ik/zen4_faster_iq4ks_iq5ks
ik/zen4_flash_attn
ik/zen4_flash_attn_2
ik/zen4_flash_attn_bf16
ik/zen4_iq4_xs_r4
ik/zen4_repack_f16
ikawrakow-patch-1
ikawrakow-patch-1-1
ikawrakow-patch-2
main
revert-1696-fix/recurrent-state-reset
s6/MLA_prompt_save_restore_fix
s6/bitnet2b_2501
s6/bitnet_name_update
s6/cache_default
s6/deci_support
s6/docs_update
s6/dots
s6/fix_kshift_crash
s6/fix_prompt_tokenization
s6/fix_python
s6/fp8_native
s6/imatrix_conv
s6/list_prompt_cache
s6/mikupad
s6/mla
s6/numa_KV
s6/qwen3_dynamic_yarn
s6/readme-minor1
s6/readme-minor2
s6/readme_update
s6/remove_kv_l
s6/rope_freq_fix
s6/rpc
s6/seed_support2
s6/sweep_bench
s6/sweep_bench_update
s6/termux_fix
s6/warmup
#1
#10
#1000
#1001
#1003
#1004
#1005
#1006
#1007
#1008
#101
#1011
#1012
#1016
#1017
#1018
#102
#1022
#1023
#1024
#1025
#1026
#1027
#1029
#1030
#1031
#1032
#1033
#1034
#1035
#1036
#1037
#1038
#1039
#1040
#1042
#1047
#1048
#1049
#105
#1050
#1051
#1052
#1053
#1054
#1056
#1057
#1058
#1059
#106
#1060
#1061
#1062
#1063
#1064
#1065
#1067
#1068
#1069
#107
#1070
#1071
#1073
#1079
#108
#1080
#1082
#1086
#1087
#1088
#1089
#109
#1091
#1092
#1093
#1094
#1096
#1097
#11
#110
#1100
#1101
#1103
#1104
#1105
#1106
#1107
#111
#1110
#1112
#1114
#1115
#1116
#1118
#1119
#112
#1120
#1121
#1124
#1126
#1128
#1129
#113
#1130
#1131
#1131
#1134
#1135
#1136
#1137
#1138
#1139
#114
#1140
#1141
#1143
#1144
#1147
#115
#1151
#1152
#1153
#1154
#1155
#1156
#116
#1160
#1161
#1164
#1165
#1166
#1168
#117
#1170
#1171
#1172
#1174
#1175
#1176
#1177
#1178
#1179
#118
#1182
#1183
#1184
#1185
#1187
#119
#1190
#1191
#1192
#1193
#1194
#1195
#1196
#1198
#1199
#12
#120
#1202
#1206
#1207
#1208
#121
#1211
#1212
#1213
#1214
#1215
#1216
#1217
#1218
#122
#1220
#1221
#1222
#1223
#1224
#1226
#123
#1231
#1235
#1236
#1238
#1239
#124
#1240
#1241
#1243
#1244
#1249
#125
#1250
#1251
#1252
#1257
#126
#1260
#1261
#1262
#1263
#1266
#1268
#1269
#127
#1270
#1272
#1274
#1275
#1276
#1277
#1278
#1279
#128
#1280
#1283
#1284
#1285
#1286
#1287
#1288
#129
#1292
#1295
#1296
#13
#130
#1300
#1301
#1303
#1304
#1305
#1306
#1307
#1308
#1309
#131
#1310
#1311
#1313
#1314
#1315
#1318
#132
#1320
#1321
#1322
#1326
#1328
#1329
#1330
#1331
#1332
#1333
#1335
#1336
#1337
#1339
#134
#1340
#1345
#1346
#1347
#1349
#135
#1350
#1352
#1354
#1355
#1359
#136
#1361
#1362
#1365
#1366
#1367
#1368
#1369
#137
#1371
#1372
#1373
#1374
#1375
#1376
#1377
#1378
#138
#1386
#1388
#139
#1392
#1393
#1397
#1398
#14
#1400
#1402
#1403
#1404
#1405
#1407
#1408
#141
#1410
#1412
#1413
#1417
#1418
#1419
#142
#1421
#1422
#1423
#1424
#1425
#1426
#1427
#1429
#143
#1430
#1433
#1435
#1436
#1437
#1439
#144
#1440
#1441
#1443
#1444
#1446
#1447
#145
#1450
#1451
#1452
#1454
#1455
#1456
#1458
#1459
#146
#1460
#1462
#1463
#1464
#1466
#1467
#1468
#1469
#147
#1472
#1474
#1475
#1476
#1477
#1479
#148
#1482
#1483
#1485
#149
#1490
#1491
#1492
#1493
#1494
#1496
#1497
#1498
#1499
#150
#1501
#1503
#1504
#1505
#1506
#1508
#151
#1510
#1511
#1512
#1513
#1515
#1516
#1517
#1518
#1519
#152
#1521
#1526
#1527
#153
#1530
#1531
#1535
#1539
#154
#1540
#1542
#1543
#1546
#1547
#1548
#1549
#155
#1550
#1553
#1556
#1558
#1558
#156
#1560
#1561
#1562
#1564
#1565
#1567
#157
#1570
#1571
#1573
#1574
#1577
#1578
#1579
#158
#1581
#1582
#1583
#1584
#1585
#1590
#1592
#1593
#1593
#1595
#1596
#1597
#1598
#1599
#16
#1600
#1601
#1603
#1604
#1606
#1609
#161
#1610
#1615
#1617
#1617
#162
#1625
#1626
#1627
#163
#1633
#1634
#1635
#1637
#1638
#1641
#1644
#1645
#1646
#1647
#1648
#1649
#1651
#1652
#1653
#1654
#1654
#1657
#1659
#1666
#1669
#1672
#1673
#1677
#1679
#168
#1682
#1683
#1686
#1687
#1688
#1689
#169
#1690
#1691
#1696
#1698
#17
#170
#1700
#1701
#1702
#1703
#1704
#1707
#171
#1710
#1713
#1714
#1716
#1717
#1718
#172
#1721
#1722
#1723
#1724
#1726
#1727
#1727
#1728
#1729
#173
#1731
#1732
#1733
#1734
#1735
#1736
#1738
#1738
#174
#1741
#1743
#1744
#1745
#1746
#1746
#1748
#175
#1750
#1753
#1755
#1756
#1757
#1758
#1759
#176
#1760
#1761
#1764
#1764
#1767
#177
#1770
#1771
#1773
#1774
#1776
#1777
#1778
#178
#1780
#1781
#1782
#1783
#1784
#1785
#1786
#1787
#1788
#1789
#179
#1791
#1792
#1794
#1795
#1796
#1797
#1798
#1799
#180
#1800
#1801
#1803
#1804
#1805
#1806
#1808
#1809
#181
#1810
#1813
#1815
#1816
#1817
#1819
#182
#1820
#1821
#1822
#1825
#1826
#1827
#1828
#1830
#1830
#1832
#1834
#1835
#1838
#184
#1840
#1841
#1844
#1846
#1847
#1848
#1849
#1849
#185
#1851
#1852
#1853
#1853
#1854
#1855
#1857
#1858
#186
#1860
#1861
#1862
#1866
#1867
#1869
#187
#1870
#1871
#1872
#1873
#1876
#1877
#1877
#1879
#188
#1880
#1881
#1883
#1884
#1885
#1886
#1887
#1888
#1888
#1889
#189
#1890
#1892
#1892
#1893
#1893
#1894
#1894
#1895
#1897
#1899
#19
#190
#1901
#1903
#1904
#1906
#1907
#1908
#191
#1911
#1911
#1912
#1913
#1914
#1914
#1918
#1919
#192
#1920
#1921
#1922
#1923
#1923
#1924
#1924
#1925
#1925
#193
#194
#195
#197
#198
#2
#20
#200
#202
#204
#205
#206
#207
#208
#21
#210
#212
#213
#215
#216
#218
#219
#22
#220
#225
#226
#229
#23
#231
#232
#233
#234
#235
#236
#237
#238
#239
#24
#240
#241
#243
#244
#246
#247
#248
#250
#251
#252
#253
#259
#260
#261
#262
#264
#265
#268
#269
#27
#270
#272
#273
#274
#275
#276
#277
#278
#279
#28
#280
#282
#283
#284
#287
#289
#290
#291
#292
#294
#295
#298
#299
#3
#301
#302
#303
#304
#307
#309
#31
#310
#311
#312
#313
#315
#317
#318
#32
#320
#321
#324
#325
#326
#327
#328
#329
#33
#330
#331
#332
#333
#336
#337
#338
#341
#342
#343
#344
#346
#347
#348
#349
#35
#351
#352
#355
#356
#36
#360
#364
#366
#368
#369
#37
#370
#371
#374
#375
#377
#38
#382
#386
#39
#390
#391
#392
#394
#4
#40
#400
#402
#404
#405
#406
#408
#409
#41
#410
#411
#413
#414
#415
#416
#417
#418
#42
#421
#422
#424
#426
#427
#428
#429
#43
#430
#431
#435
#438
#439
#44
#441
#442
#443
#444
#445
#446
#448
#449
#45
#453
#454
#457
#458
#46
#460
#461
#462
#465
#468
#469
#47
#470
#471
#473
#475
#478
#48
#480
#481
#482
#483
#484
#486
#487
#488
#489
#49
#492
#493
#494
#495
#496
#497
#5
#50
#501
#502
#504
#505
#506
#508
#509
#51
#510
#511
#512
#513
#515
#516
#517
#518
#52
#520
#524
#525
#528
#529
#53
#531
#533
#534
#535
#536
#537
#54
#540
#541
#542
#544
#546
#547
#549
#55
#550
#552
#553
#554
#554
#555
#557
#558
#559
#56
#560
#563
#565
#566
#567
#569
#57
#570
#571
#573
#574
#577
#578
#579
#58
#580
#581
#582
#583
#584
#585
#587
#588
#589
#592
#593
#595
#598
#6
#602
#603
#604
#606
#607
#608
#609
#61
#610
#611
#612
#616
#617
#618
#62
#620
#622
#624
#628
#630
#631
#637
#639
#64
#640
#642
#643
#645
#648
#65
#652
#653
#653
#654
#66
#661
#662
#668
#670
#672
#674
#676
#677
#68
#680
#682
#683
#684
#688
#689
#69
#692
#695
#696
#698
#699
#7
#70
#700
#701
#702
#705
#707
#708
#709
#71
#710
#711
#712
#713
#714
#716
#717
#719
#72
#720
#722
#723
#724
#726
#727
#728
#73
#734
#735
#738
#739
#74
#740
#741
#742
#745
#748
#75
#751
#752
#754
#757
#759
#76
#760
#762
#764
#768
#77
#771
#774
#78
#782
#786
#787
#788
#789
#79
#790
#791
#794
#795
#796
#797
#798
#799
#80
#801
#802
#803
#807
#81
#810
#814
#817
#820
#823
#824
#825
#826
#828
#829
#83
#833
#835
#836
#837
#838
#84
#840
#841
#842
#843
#844
#845
#85
#850
#851
#852
#853
#855
#857
#858
#86
#860
#861
#863
#864
#866
#868
#87
#870
#871
#872
#874
#875
#876
#878
#879
#880
#881
#882
#883
#887
#889
#89
#891
#892
#894
#896
#897
#899
#9
#90
#900
#901
#902
#903
#906
#907
#91
#910
#911
#913
#914
#916
#920
#921
#922
#923
#924
#926
#928
#929
#93
#931
#932
#933
#934
#935
#936
#937
#938
#939
#94
#941
#943
#944
#945
#947
#948
#949
#951
#952
#954
#957
#958
#959
#96
#963
#965
#966
#968
#969
#97
#970
#971
#972
#973
#976
#977
#98
#980
#983
#984
#985
#987
#988
#989
#99
#991
#992
#993
#995
#996
#998
#999
t0002
Select branches
Hide Pull Requests
fcp/context_shift_fix
fcp/cors-proxy
fcp/fix_rpc_device
fcp/webui_update2
fix-recurrent-ckpt-prealloc
ik/FlashMLA-3
ik/adapt_iq1_iq2_bn
ik/adaptive_p
ik/adaptive_p_2
ik/add_extra_output_tensor
ik/add_forgotten_multi_add
ik/add_granite
ik/add_iq3ks_to_gguf
ik/add_jinja_file_help
ik/add_missing_enum_values_qwen3
ik/add_missing_gguf_constants
ik/add_missing_mmq_iq5ks
ik/add_mmq_id
ik/add_mtmd
ik/add_q60
ik/add_vq_help
ik/allow_empty_splits
ik/andrew_trellis
ik/another_mmq_id_fix
ik/apply_cuda_faster_iq3k
ik/arch_flags
ik/arm_better_r4
ik/attn_gemm
ik/avoid_cuda_mla_1
ik/avoid_per_step_ssm_copy
ik/avoid_recurrent_state_copy
ik/avx2_bf16
ik/avx2_flash_attn
ik/avx2_flash_attn_2
ik/avx2_q4_0_q8_0
ik/avx2_q5_0
ik/avx2_r4_tweaks
ik/backend_reduce_syncs
ik/bailingmoe2
ik/bailingmoe2_graph
ik/barrier
ik/bench_gp
ik/better_batched_processing
ik/better_cpu_fa_thread_strategy
ik/better_fa_glm45
ik/better_fa_masking
ik/better_fixup_stream_k
ik/better_flash_mla
ik/better_graph_pp
ik/better_graph_tg
ik/better_iq4_nl
ik/better_iqk_strategy
ik/better_model_info
ik/better_moe_small_batch
ik/better_mtp
ik/better_n_cpu_moe
ik/better_q40_kv_cache
ik/better_q40_kv_cache_cpu
ik/better_tg_fattn
ik/bf16_kv_cache
ik/bf16_r4
ik/biased_mmvq
ik/biased_qkv
ik/bitnet_adjustments
ik/bitnet_cuda
ik/bitnet_fused_unary
ik/bitnet_improve_metal
ik/bitnet_optional_scales
ik/bitnet_token_embedding_gpu
ik/bitnet_token_embedding_gpu_2
ik/bonsai_avx2
ik/bonsai_neon
ik/buffer_type_overrides
ik/bug_missing_parentheses
ik/cached_graph
ik/change_default_fa_offset
ik/change_fmoe_fa_defaults
ik/change_q_pure
ik/chat_templates
ik/check_cpu_fa_supported_types
ik/check_for_empty_mask
ik/check_up_gate_fmoe
ik/clang_warnings
ik/cleanup_fudge_factors
ik/cohere2
ik/cohere2_sm_graph
ik/convert_i2s
ik/copyright
ik/correct_glm47_flash_gating_func
ik/correct_missing_gating_func_comments
ik/cpp_17
ik/cpu_argsort
ik/cpu_deepseek_fa
ik/cpu_fa_dont_repack_tg
ik/cpu_fa_tg_glm4.5
ik/cpu_mla_all_quants
ik/cpu_moe_tg
ik/cpu_repeat
ik/cpu_swa_fa
ik/cpu_swa_v0
ik/cpu_swa_v1
ik/cpu_swa_v2
ik/cpu_topk_moe
ik/cuda_better_moe
ik/cuda_bf16
ik/cuda_ctx_mess
ik/cuda_faster_iq2k
ik/cuda_faster_iq4nl_kvcache
ik/cuda_faster_moe_tg
ik/cuda_fattn_Dk_Dv
ik/cuda_fix_quantized_flash_mla3
ik/cuda_flash_mla3
ik/cuda_flash_mla3_v2
ik/cuda_flash_mla_q8_0
ik/cuda_graphs_with_overrides
ik/cuda_grouped_topk
ik/cuda_iq1_m_r4
ik/cuda_iq1_s_r4
ik/cuda_iq2k_use_bperm1
ik/cuda_iq3k_use_bperm1
ik/cuda_iq4_k_r4
ik/cuda_iqk_ks_r4
ik/cuda_iqk_r4
ik/cuda_large_cpy
ik/cuda_lto
ik/cuda_mailine_fixes
ik/cuda_mla
ik/cuda_mla2
ik/cuda_mmq_iq2_k
ik/cuda_mmq_iq4_k
ik/cuda_mmq_iq4_ks
ik/cuda_native
ik/cuda_params
ik/cuda_q4_0_r4
ik/cuda_quantized_fmoe
ik/cuda_refactor_fattn
ik/cuda_rms_non_contiguous
ik/cuda_rope_back
ik/cuda_set_device
ik/cuda_swa2
ik/cuda_swa3
ik/cuda_topk_moe
ik/cuda_tracer
ik/cuda_use_bperm
ik/cuda_use_pinned_memory
ik/cuda_use_pinned_memory_2
ik/custom_q_rules
ik/debug_849
ik/debug_issue_721
ik/debug_issue_733
ik/dedup_stb_image
ik/deepseek_fa_opt
ik/deepseek_guarantee_rope_fusion
ik/deepseek_is_this_better
ik/deepseek_merge_qk
ik/deepseek_mla0
ik/deepseek_opt
ik/deepseek_rope_cache
ik/delta_dry
ik/delta_net
ik/delta_net_neon
ik/delta_net_tweaks
ik/dequant_gemm
ik/dequant_moe_gemm
ik/desperate_bug_fix_attempt
ik/disable_add_fused_rms
ik/disable_experimental_code1
ik/disable_fusion_by_default
ik/disable_k_shift_smgraph
ik/disable_khadamard_if_not_power2
ik/disable_multi_add
ik/disable_or_enable_p2p
ik/disable_rope_cache
ik/disable_sm_row
ik/disable_smgraph_qwen35moe_mtp
ik/disable_smgraph_recurrent
ik/disable_some_fusion
ik/disable_vocab_debug
ik/disabled_cuda_graphs
ik/disallow_speculation_for_hybrid
ik/dont_abort_on_nccl_init_failure
ik/dont_split_output
ik/dup_experts_bias
ik/enable_all_iqk_fa_quants
ik/enable_cuda_graphs_with_reduce
ik/enable_fusion_by_default
ik/enable_mla3_in_crippled_ggufs
ik/enable_smgraph_mla_hybrid
ik/ernie_graph
ik/extra_reduce_types
ik/fa_mainline_compat
ik/fa_offset_2
ik/falcon3
ik/falcon3a
ik/falcon_edge
ik/fancy_simd_log
ik/fast_sampling_avx2
ik/faster_avx2_q40
ik/faster_cpu_fused_rms
ik/faster_cpu_fused_rms1
ik/faster_imatrix
ik/faster_iq2ks_quantize
ik/faster_iq3_iq5_quantize
ik/faster_iq4k
ik/faster_iq4k_quantize
ik/faster_iq4nl_quantize
ik/faster_moe_quantize
ik/faster_per_step_restore
ik/faster_q60_avx2
ik/fattn_Dk_Dv
ik/fattn_bf16
ik/fattn_enable_iq4_nl
ik/fattn_enable_q6_0
ik/fattn_fix_overflow
ik/fattn_gqa_10
ik/fattn_is_supported
ik/fattn_kq_max_offset
ik/fattn_kqv
ik/fattn_mma
ik/fattn_q35dense
ik/fattn_work_buffer
ik/fdn_fuse_silu_cpu
ik/fit_dense_model
ik/fix_1015
ik/fix_1055
ik/fix_1205
ik/fix_1237
ik/fix_1432
ik/fix_1438
ik/fix_1478
ik/fix_300
ik/fix_358
ik/fix_412
ik/fix_447
ik/fix_499
ik/fix_538
ik/fix_596
ik/fix_827
ik/fix_Makefile
ik/fix_add_bf16_turing
ik/fix_after_883
ik/fix_again_cmake
ik/fix_annoying_warnings
ik/fix_arm_fa
ik/fix_avx2_gemm_mess
ik/fix_avx2_iq4_nl_r4
ik/fix_avx512_vs_fancy_simd
ik/fix_batched_cublas
ik/fix_bench_compile
ik/fix_bug_481
ik/fix_bug_added_in_1506
ik/fix_comma_pauses
ik/fix_compiler_warnings
ik/fix_contiguously_allocated
ik/fix_cpu_fa_bf16
ik/fix_cpu_fa_work_buffer_size
ik/fix_cuda_fa_race
ik/fix_cuda_memcpy_async
ik/fix_cuda_nans
ik/fix_cuda_scale_bug
ik/fix_debug_build
ik/fix_deepseek_fattn
ik/fix_deepseek_q80_cache
ik/fix_dequantize_when_requantizing
ik/fix_div_zero
ik/fix_dst_backend
ik/fix_dup_q
ik/fix_exp_shexp_split
ik/fix_experts_node_name
ik/fix_fa_192_128
ik/fix_fa_avx2_bug
ik/fix_fattn_odd_even
ik/fix_fattn_supported
ik/fix_flash_attn
ik/fix_fused_grouped_topk
ik/fix_gcc_arm
ik/fix_gemma3_vision
ik/fix_gemma4_hybrid
ik/fix_gemma4_quantized_KV_cache_cuda
ik/fix_gemma4_quantized_kv_cache_cpu
ik/fix_gemma_e4b
ik/fix_ggml_common
ik/fix_ggml_nbytes
ik/fix_glm4_attn
ik/fix_glm_mtp
ik/fix_glm_mtp_accept
ik/fix_glm_mtp_smgraph
ik/fix_gpt_oss_partial_offload
ik/fix_graph_parallel_partial_offload
ik/fix_hadamard_bug
ik/fix_hybrid_detection
ik/fix_hybrid_graph_muge
ik/fix_imatrix_check
ik/fix_imatrix_nonsense
ik/fix_iq4k_avx2
ik/fix_iqk_for_strange_numrows
ik/fix_jinja
ik/fix_kimi2_parse
ik/fix_kld
ik/fix_kq
ik/fix_llama4_attention
ik/fix_llama_kv_cache_cell_max
ik/fix_metal_fa
ik/fix_minimax_hadamard
ik/fix_misleading_quantize_error
ik/fix_missing_bf16_avx512
ik/fix_missing_dry
ik/fix_missing_end
ik/fix_mistral3_smgraph
ik/fix_mla1
ik/fix_mla_imatrix
ik/fix_mla_smgraph_cache_load_save
ik/fix_mmproj_bf16_cpu
ik/fix_mmq_id
ik/fix_mmq_overflow
ik/fix_mmvq_bug
ik/fix_mtp_discarding
ik/fix_mtp_no_gr
ik/fix_mtp_plus_muge
ik/fix_mul_mat_16
ik/fix_multiple_choice
ik/fix_neon_build
ik/fix_neon_legacy_quants
ik/fix_neon_q82
ik/fix_no_iqk_build
ik/fix_no_p2p_case
ik/fix_partial_ngl_smgraph
ik/fix_partial_ngl_smgraph_mla
ik/fix_partial_offload_crash
ik/fix_perf_regression
ik/fix_pr_261
ik/fix_pr_842
ik/fix_q35moe_mtp_smgraph
ik/fix_q41_q51_arm
ik/fix_q5_0_fa
ik/fix_q6_0_dequantize
ik/fix_q80_avx2_2
ik/fix_q80_avx2_mess
ik/fix_q80_moe_avx2
ik/fix_quantize_kt
ik/fix_quantized_k_cache
ik/fix_quantized_kv_nofa
ik/fix_qwen35_smgraph_hybrid
ik/fix_qwen35moe_low_mtp_acceptance
ik/fix_reduce_race
ik/fix_reduce_windows
ik/fix_repacked_legacy_quants
ik/fix_replace_all
ik/fix_requantize_interleaved
ik/fix_requantize_interleaved_2
ik/fix_ring_reduction
ik/fix_rope_norm_fast_cuda
ik/fix_rpc_off
ik/fix_rpc_off2
ik/fix_rtr_mqkv
ik/fix_ser
ik/fix_ser_cuda
ik/fix_sm_graph_with_vision
ik/fix_standard_attention_cpu
ik/fix_sync_logic
ik/fix_the_fix
ik/fix_typo
ik/fix_unknown_tensor_type
ik/fix_up_gate_mmq_not_supported
ik/fix_vulkan_required
ik/fix_windows
ik/fix_windows_avx512
ik/fix_windows_no_omp
ik/fix_xeon_6226R
ik/flash_mla
ik/flash_mla2_cuda_no_f32
ik/flash_mla2_no_f32
ik/flash_mla_2
ik/flash_mla_4
ik/flash_precision
ik/flax-vector-conversions
ik/format_name
ik/fuse_add_add_fused_rms
ik/fuse_add_fused_rms
ik/fuse_bias_only_tg
ik/fuse_biased_qkv
ik/fuse_kvcache_copy
ik/fuse_merge_up_gate_exps
ik/fuse_moe_up_gate
ik/fuse_mul_mat_scale
ik/fuse_qkv
ik/fuse_rms_rms_add
ik/fuse_ssm_silu_neon
ik/fused_bailingmoev2
ik/fused_delta_net
ik/fused_delta_net_2
ik/fused_delta_net_3
ik/fused_delta_net_3a
ik/fused_delta_net_avx512
ik/fused_ffn_up_gate
ik/fused_mul_multiadd
ik/fused_mul_unary
ik/fused_mul_unary_1
ik/fused_norm
ik/fused_rms_norm
ik/fused_rms_rms
ik/fused_rope_rope
ik/fused_softcap_softmax
ik/fused_up_gate_unary
ik/gemm_4d
ik/gemm_iq1s
ik/gemm_neon_1bit
ik/gemm_neon_iqk
ik/gemm_neon_iquants
ik/gemm_neon_kquants
ik/gemm_neon_legacy
ik/gemma3
ik/gemma3_mqkv_rcache
ik/gemma4
ik/gemma4_12B_smgraph
ik/gemma4_fuse_logits
ik/gemma4_gp_bugfix
ik/gemma4_mtmd_blindness
ik/gemma4_mtp_avoid_f32_cast
ik/gemma4_mtp_extra_output
ik/gemma4_routing
ik/gemma4_tokenizer_fixes
ik/gemma4_vision
ik/gemma_output_tensor
ik/gemma_q80_kvcache
ik/gemv_bf16_r16
ik/gguf_bool_arrays
ik/gguf_py_add_maxfp4
ik/gguf_py_changes_for_np2.0
ik/glm45_tg_fa_hack
ik/glm45_tg_very_fast
ik/glm47_fa_2
ik/glm47_tg_fa_hack
ik/glm5
ik/glm5_mtp
ik/glm_flash
ik/gpt-oss
ik/gpt_oss_graph
ik/gpu_layers
ik/gpu_layers_2
ik/gpu_layers_3
ik/graph_alloc
ik/graph_better_splits
ik/graph_parallel_tweak
ik/graph_reuse
ik/graph_reuse_field
ik/graph_reuse_on
ik/hadamard_512
ik/hadamard_block_size
ik/handle_incompatible_deepseek_ggufs
ik/handle_split_cache
ik/hide_imatrix
ik/honor_manual_splits
ik/hsums
ik/huihui_57B
ik/hunyuan_graph
ik/ignore_nextn
ik/ignore_nextn_layers
ik/imatrix_ffn_gate
ik/imatrix_fused_up_gate
ik/imatrix_lsim
ik/improve_iq1m
ik/improve_iq2_xs
ik/improve_iq2ks
ik/improve_mmq
ik/interleaved_guards
ik/iq1_kt
ik/iq1_m_neon
ik/iq1_m_r4
ik/iq1_s_checks
ik/iq1_s_gemm
ik/iq1_s_r4
ik/iq1_s_r4_k128
ik/iq1_s_r4_neon
ik/iq1_tn
ik/iq1_tn_cuda
ik/iq1_tn_metal
ik/iq1bn_metal
ik/iq1m_gemm
ik/iq2_bn_r4
ik/iq2_k
ik/iq2_k_r4
ik/iq2_k_tweak
ik/iq2_kl
ik/iq2_s_r4
ik/iq2_tn
ik/iq2_tn_as_iq2_bn
ik/iq2_tn_avx2
ik/iq2_tn_faster_pp
ik/iq2_xs_r4
ik/iq2_xxs_gemm
ik/iq2_xxs_r4
ik/iq2k_experiments
ik/iq2ks_experiments
ik/iq3_k_r4_v2
ik/iq3_ks
ik/iq3_ks_v2
ik/iq3_s_gemm
ik/iq3_s_r4
ik/iq3_s_r4_v2
ik/iq3_xxs_gemm
ik/iq3_xxs_r4
ik/iq3_xxs_r4_v2
ik/iq4_k
ik/iq4_k_r4
ik/iq4_k_r4_avx2
ik/iq4_k_tweaks
ik/iq4_k_xxs
ik/iq4_knn
ik/iq4_ks_r4
ik/iq4_kss
ik/iq4_kss_improvements
ik/iq4_nl_cache
ik/iq4_nl_x4
ik/iq4_xs_r4
ik/iq4_xs_r4_avx2
ik/iq4_xs_r8
ik/iq4_xs_r8_v2
ik/iq4kss_experiments
ik/iq4nl_kv_cache
ik/iq5_k_r4
ik/iq5_ks
ik/iq5_ks_r4
ik/iq6_k
ik/iq_gemv_tweaks
ik/iqk_fattn_all_quants
ik/iqk_gemm
ik/iqk_mmvq_opt
ik/iqk_q_improvements
ik/is_this_better_for_multi_gpu
ik/issue_214
ik/issue_217
ik/issue_224
ik/issue_230
ik/k_cache_hadamard
ik/k_cache_hadamard_cuda
ik/keep_mmap_with_no_pinned
ik/kq_fused_softmax
ik/kq_mask
ik/kq_mask_padding_64
ik/l4_rms_norm
ik/legacy_gemm
ik/limit_amb
ik/llama4
ik/llama_bench_fit
ik/llama_bench_mla3
ik/llama_bench_n_cpu_moe
ik/llama_bench_overrides
ik/llama_bench_rcache
ik/llama_bench_sas
ik/llama_bench_sm_arg
ik/llama_bench_tgb
ik/llama_hparams_add_mla
ik/llama_warnings
ik/log_probs_on_crash
ik/logging_cleanup
ik/make_biased_gemv_optional
ik/make_qx_quants
ik/mask_mt
ik/max_nodes
ik/max_nodes_again
ik/measure_barriers
ik/mellum_sm_graph
ik/merge_Aug_12_2024
ik/merge_July_26_2024
ik/merge_only_qk
ik/merge_qkv
ik/merge_up_gate_exps_2
ik/merge_up_gate_exps_3
ik/metal_bf16
ik/metal_faster_iq4ks
ik/metal_fattn_update
ik/metal_fix_iq2k
ik/metal_fix_iq3k
ik/metal_moe
ik/metal_new_trellis
ik/mimo2
ik/mimo2.5
ik/mimo2_4_gpus
ik/mimo2_graph
ik/minimax2_very_fast
ik/minimax_graph_minor
ik/ministral3
ik/minmax2_sm_graph
ik/minor_delta_tweak
ik/minor_iq2ks_tweak
ik/minor_mtp1
ik/minor_silu
ik/mistral3_large
ik/mistral3_std_attn
ik/mistral4
ik/mistral4_cpu_fa
ik/mixd_kv_cache
ik/mla
ik/mla2_q80_cache
ik/mla2_q80_cache_cpu
ik/mla=3_by_default
ik/mla_add_extra_nodes
ik/mla_fixes
ik/mla_guard
ik/mla_imatrix
ik/mla_no_transposed_cache
ik/mla_q80
ik/mla_smgraph
ik/mmq_id_thresh
ik/mmq_iq_ks_r4
ik/mmq_to_cublas
ik/mmvq_args
ik/mmvq_fuse_bias
ik/mmvq_type_supported
ik/model_fit
ik/moe_fused_unary
ik/moe_offload_strategy
ik/more_set_device
ik/mtmd_kq_type
ik/mtmd_reduce_memory_use
ik/mtp_accept_only_last_logits
ik/mtp_async_copies
ik/mtp_per_step_smgraph
ik/mtp_requantize_output
ik/mtp_reuse_graphs
ik/mtp_reuse_graphs_2
ik/mtp_tweaks1
ik/mtp_tweaks_2
ik/mul_mat_bf16
ik/mul_mat_ext
ik/multi_add
ik/mv_q4_0_r4
ik/mxfp4
ik/n_cpu_moe
ik/nccl1
ik/nccl2
ik/nccl3
ik/nccl3_async
ik/neon_bf16
ik/neon_flash_attention_2
ik/neon_flash_attention_3
ik/neon_improve_legacy_quants
ik/neon_iq3_kt
ik/new_iq1bn
ik/new_iq2kt
ik/new_iq2kt_v2
ik/new_iq4kt
ik/new_trellis_2
ik/no_KV_for_unused_layers
ik/non_contiguous_rope
ik/offline_repack
ik/offline_repack_patterns
ik/offload_policy
ik/ooae2
ik/ooae_on_by_default
ik/opt_kt_quants
ik/option_cpu_fa
ik/option_to_disable_cuda_fusion
ik/optional_yarn_log_multiplier
ik/ot_ffn_gate_up
ik/p2p_cpy_set_device
ik/per_gpu_fit_margin
ik/per_row_scale
ik/per_step_conv_states
ik/phi3.5_tweaks
ik/pickup_13095
ik/pinned_suggest
ik/play_with_barrier
ik/poc_tp
ik/poc_tp_glm4.5
ik/pre_merged_up_gate
ik/prepare_wk_b
ik/q2_k_r4
ik/q35_tweaks
ik/q3_k_r4
ik/q3next_concat
ik/q3next_concat_cpu
ik/q3next_cuda_graphs
ik/q3next_opt2
ik/q3next_opt3
ik/q4_0_r4
ik/q4_0_r8
ik/q4_k_gemm
ik/q4_k_r4
ik/q4_k_r4_v2
ik/q4_k_r4_v3
ik/q5_0_r4
ik/q5_k_r4
ik/q60_mmq
ik/q6_0_r4
ik/q6_k_gemm
ik/q6_k_r4
ik/q8_0_r4
ik/q8_KV
ik/q8_k_r16
ik/q8_k_r8
ik/q8_k_r8_avx512
ik/qkvz_tweak
ik/qkvz_tweak1
ik/qmix_tweaks
ik/qmix_tweaks_2
ik/qstats
ik/quantization_tweaks
ik/quantize_dry_run
ik/quantize_ffn_gate_inp
ik/quantize_fused_up_gate
ik/quantize_gemma4
ik/quantize_mmproj
ik/quantize_options
ik/quantize_q8k_avx2
ik/quantize_stats
ik/qwen3.5_vision
ik/qwen35_model_types
ik/qwen35_std_attn
ik/qwen35dense
ik/qwen35moe
ik/qwen35moe_muge
ik/qwen3_graph
ik/qwen3next
ik/qwen3vl_graph
ik/qwen_mtp_inp_out_ids
ik/qx_0_r4_avx2
ik/qx_k_b32_avx2
ik/r4_faster_zen4
ik/r4_neon
ik/r4_nrcy_16
ik/really_fix_rope_cache
ik/reduce_compute_buffers
ik/reduce_make_copies
ik/reduce_mla3_compute_buffer_size
ik/reduce_no_nccl
ik/reduce_race_quick_fix
ik/refactor_graphs
ik/refactor_iqk
ik/refactor_llama.cpp
ik/remove_iqk_option
ik/remove_kv_l
ik/remove_llamafile
ik/remove_scary_warning
ik/remove_unnecessary_calls
ik/remove_unnessessary_ids_copy
ik/rename_4_8
ik/rename_iq4_nl_x4
ik/reorg_mmvq_and_fuse_bias
ik/repack_also_experts
ik/repack_f16
ik/reset_1st_recurrent_graph
ik/revert_0bf4d997
ik/revert_1496
ik/revert_1687
ik/revert_739
ik/revert_delta_net_3
ik/reverts
ik/ring_reduce
ik/rm_Makefile
ik/rms_block_size
ik/rng_sampling
ik/rope_cache
ik/rtr_plus_muge
ik/run_time_repack
ik/sampling-top-n-sigma
ik/sampling-xtc
ik/sampling_refactor_sorting
ik/sampling_top_n_sigma
ik/sanitize_importance_iqk
ik/sanitize_importance_kt_quants
ik/sched_copy_experts
ik/sched_max_copies=1
ik/server_send_done
ik/set_draft_input_hidden_state
ik/shexps_better_hybrid
ik/simplify_delta_net
ik/simplify_delta_net_2
ik/skip_get_rows
ik/skip_noop_barriers
ik/skip_rowids_computation
ik/skip_unnecessary_quantize
ik/slightly_better_fdn
ik/slightly_better_graph_split_strategy
ik/sm_graph_cuda_graphs
ik/sm_graph_delta_net
ik/sm_graph_disable_cuda_graphs
ik/sm_graph_gemma4_moe
ik/sm_graph_max_gpu
ik/sm_graph_muge
ik/sm_graph_partial_offload
ik/sm_graph_pre_merged_up_gate
ik/sm_graph_q35
ik/sm_graph_q3next
ik/sm_graph_qwen35moe
ik/sm_graph_rearrange
ik/sm_graph_seedoss
ik/sm_graph_step35
ik/sm_graph_sync
ik/smart_expert_selection
ik/smollm3
ik/softcap
ik/softcap_minor
ik/split_graph_2
ik/split_mode_f32
ik/ssm_conv4_avx2
ik/ssm_conv4_silu
ik/standardize_gemma4
ik/step35
ik/step35_compat
ik/support_gigachat
ik/sweep_bench_n_predict
ik/sweep_bench_nrep
ik/sweep_bench_warmup
ik/swiglu
ik/sync_fa
ik/tensor_override_honor_mmap
ik/test_q80_NaNs
ik/test_thp
ik/tg_tweaks
ik/topk_moe_fuse_bias
ik/topk_moe_with_norm
ik/trellis_bf16
ik/trellis_metal
ik/trellis_neon
ik/trellis_opt
ik/trinet
ik/try_authors
ik/try_cuda_graphs
ik/try_fa_no_q80_repack
ik/try_fix_1014
ik/try_fix_1201
ik/try_fix_1222
ik/try_fix_367
ik/try_fix_367_v2
ik/try_fix_690
ik/try_fix_772
ik/try_fix_854
ik/try_fix_974
ik/try_fix_avx2_fa
ik/try_fix_many_gpus
ik/try_fix_many_gpus_2
ik/try_grouped_topk_playing1
ik/try_minimax_better_sm_graph
ik/try_remove_cpy_indirection
ik/try_split_mla
ik/try_split_offloaded_moe_up_gate
ik/try_svd
ik/try_trellis
ik/undo_1049_if_tensor_overrides
ik/undo_1421
ik/undo_sync_reduction
ik/update_authors
ik/update_license
ik/use_bf16_when_no_mmq
ik/use_mmq_id_for_moe
ik/use_q8_2
ik/v_cache_hadamard
ik/validate_quants_on_load
ik/vendor
ik/vulkan1
ik/vulkan_again
ik/vulkan_disable_fused_ops
ik/vulkan_disable_multi_add
ik/vulkan_fattn
ik/vulkan_fused_mul_unary
ik/vulkan_fused_rms
ik/vulkan_multi_add
ik/warn_pinned_alloc
ik/wip_sync_llama
ik/worst_graph_tokens
ik/zen4_faster_iq4ks_iq5ks
ik/zen4_flash_attn
ik/zen4_flash_attn_2
ik/zen4_flash_attn_bf16
ik/zen4_iq4_xs_r4
ik/zen4_repack_f16
ikawrakow-patch-1
ikawrakow-patch-1-1
ikawrakow-patch-2
main
revert-1696-fix/recurrent-state-reset
s6/MLA_prompt_save_restore_fix
s6/bitnet2b_2501
s6/bitnet_name_update
s6/cache_default
s6/deci_support
s6/docs_update
s6/dots
s6/fix_kshift_crash
s6/fix_prompt_tokenization
s6/fix_python
s6/fp8_native
s6/imatrix_conv
s6/list_prompt_cache
s6/mikupad
s6/mla
s6/numa_KV
s6/qwen3_dynamic_yarn
s6/readme-minor1
s6/readme-minor2
s6/readme_update
s6/remove_kv_l
s6/rope_freq_fix
s6/rpc
s6/seed_support2
s6/sweep_bench
s6/sweep_bench_update
s6/termux_fix
s6/warmup
#1
#10
#1000
#1001
#1003
#1004
#1005
#1006
#1007
#1008
#101
#1011
#1012
#1016
#1017
#1018
#102
#1022
#1023
#1024
#1025
#1026
#1027
#1029
#1030
#1031
#1032
#1033
#1034
#1035
#1036
#1037
#1038
#1039
#1040
#1042
#1047
#1048
#1049
#105
#1050
#1051
#1052
#1053
#1054
#1056
#1057
#1058
#1059
#106
#1060
#1061
#1062
#1063
#1064
#1065
#1067
#1068
#1069
#107
#1070
#1071
#1073
#1079
#108
#1080
#1082
#1086
#1087
#1088
#1089
#109
#1091
#1092
#1093
#1094
#1096
#1097
#11
#110
#1100
#1101
#1103
#1104
#1105
#1106
#1107
#111
#1110
#1112
#1114
#1115
#1116
#1118
#1119
#112
#1120
#1121
#1124
#1126
#1128
#1129
#113
#1130
#1131
#1131
#1134
#1135
#1136
#1137
#1138
#1139
#114
#1140
#1141
#1143
#1144
#1147
#115
#1151
#1152
#1153
#1154
#1155
#1156
#116
#1160
#1161
#1164
#1165
#1166
#1168
#117
#1170
#1171
#1172
#1174
#1175
#1176
#1177
#1178
#1179
#118
#1182
#1183
#1184
#1185
#1187
#119
#1190
#1191
#1192
#1193
#1194
#1195
#1196
#1198
#1199
#12
#120
#1202
#1206
#1207
#1208
#121
#1211
#1212
#1213
#1214
#1215
#1216
#1217
#1218
#122
#1220
#1221
#1222
#1223
#1224
#1226
#123
#1231
#1235
#1236
#1238
#1239
#124
#1240
#1241
#1243
#1244
#1249
#125
#1250
#1251
#1252
#1257
#126
#1260
#1261
#1262
#1263
#1266
#1268
#1269
#127
#1270
#1272
#1274
#1275
#1276
#1277
#1278
#1279
#128
#1280
#1283
#1284
#1285
#1286
#1287
#1288
#129
#1292
#1295
#1296
#13
#130
#1300
#1301
#1303
#1304
#1305
#1306
#1307
#1308
#1309
#131
#1310
#1311
#1313
#1314
#1315
#1318
#132
#1320
#1321
#1322
#1326
#1328
#1329
#1330
#1331
#1332
#1333
#1335
#1336
#1337
#1339
#134
#1340
#1345
#1346
#1347
#1349
#135
#1350
#1352
#1354
#1355
#1359
#136
#1361
#1362
#1365
#1366
#1367
#1368
#1369
#137
#1371
#1372
#1373
#1374
#1375
#1376
#1377
#1378
#138
#1386
#1388
#139
#1392
#1393
#1397
#1398
#14
#1400
#1402
#1403
#1404
#1405
#1407
#1408
#141
#1410
#1412
#1413
#1417
#1418
#1419
#142
#1421
#1422
#1423
#1424
#1425
#1426
#1427
#1429
#143
#1430
#1433
#1435
#1436
#1437
#1439
#144
#1440
#1441
#1443
#1444
#1446
#1447
#145
#1450
#1451
#1452
#1454
#1455
#1456
#1458
#1459
#146
#1460
#1462
#1463
#1464
#1466
#1467
#1468
#1469
#147
#1472
#1474
#1475
#1476
#1477
#1479
#148
#1482
#1483
#1485
#149
#1490
#1491
#1492
#1493
#1494
#1496
#1497
#1498
#1499
#150
#1501
#1503
#1504
#1505
#1506
#1508
#151
#1510
#1511
#1512
#1513
#1515
#1516
#1517
#1518
#1519
#152
#1521
#1526
#1527
#153
#1530
#1531
#1535
#1539
#154
#1540
#1542
#1543
#1546
#1547
#1548
#1549
#155
#1550
#1553
#1556
#1558
#1558
#156
#1560
#1561
#1562
#1564
#1565
#1567
#157
#1570
#1571
#1573
#1574
#1577
#1578
#1579
#158
#1581
#1582
#1583
#1584
#1585
#1590
#1592
#1593
#1593
#1595
#1596
#1597
#1598
#1599
#16
#1600
#1601
#1603
#1604
#1606
#1609
#161
#1610
#1615
#1617
#1617
#162
#1625
#1626
#1627
#163
#1633
#1634
#1635
#1637
#1638
#1641
#1644
#1645
#1646
#1647
#1648
#1649
#1651
#1652
#1653
#1654
#1654
#1657
#1659
#1666
#1669
#1672
#1673
#1677
#1679
#168
#1682
#1683
#1686
#1687
#1688
#1689
#169
#1690
#1691
#1696
#1698
#17
#170
#1700
#1701
#1702
#1703
#1704
#1707
#171
#1710
#1713
#1714
#1716
#1717
#1718
#172
#1721
#1722
#1723
#1724
#1726
#1727
#1727
#1728
#1729
#173
#1731
#1732
#1733
#1734
#1735
#1736
#1738
#1738
#174
#1741
#1743
#1744
#1745
#1746
#1746
#1748
#175
#1750
#1753
#1755
#1756
#1757
#1758
#1759
#176
#1760
#1761
#1764
#1764
#1767
#177
#1770
#1771
#1773
#1774
#1776
#1777
#1778
#178
#1780
#1781
#1782
#1783
#1784
#1785
#1786
#1787
#1788
#1789
#179
#1791
#1792
#1794
#1795
#1796
#1797
#1798
#1799
#180
#1800
#1801
#1803
#1804
#1805
#1806
#1808
#1809
#181
#1810
#1813
#1815
#1816
#1817
#1819
#182
#1820
#1821
#1822
#1825
#1826
#1827
#1828
#1830
#1830
#1832
#1834
#1835
#1838
#184
#1840
#1841
#1844
#1846
#1847
#1848
#1849
#1849
#185
#1851
#1852
#1853
#1853
#1854
#1855
#1857
#1858
#186
#1860
#1861
#1862
#1866
#1867
#1869
#187
#1870
#1871
#1872
#1873
#1876
#1877
#1877
#1879
#188
#1880
#1881
#1883
#1884
#1885
#1886
#1887
#1888
#1888
#1889
#189
#1890
#1892
#1892
#1893
#1893
#1894
#1894
#1895
#1897
#1899
#19
#190
#1901
#1903
#1904
#1906
#1907
#1908
#191
#1911
#1911
#1912
#1913
#1914
#1914
#1918
#1919
#192
#1920
#1921
#1922
#1923
#1923
#1924
#1924
#1925
#1925
#193
#194
#195
#197
#198
#2
#20
#200
#202
#204
#205
#206
#207
#208
#21
#210
#212
#213
#215
#216
#218
#219
#22
#220
#225
#226
#229
#23
#231
#232
#233
#234
#235
#236
#237
#238
#239
#24
#240
#241
#243
#244
#246
#247
#248
#250
#251
#252
#253
#259
#260
#261
#262
#264
#265
#268
#269
#27
#270
#272
#273
#274
#275
#276
#277
#278
#279
#28
#280
#282
#283
#284
#287
#289
#290
#291
#292
#294
#295
#298
#299
#3
#301
#302
#303
#304
#307
#309
#31
#310
#311
#312
#313
#315
#317
#318
#32
#320
#321
#324
#325
#326
#327
#328
#329
#33
#330
#331
#332
#333
#336
#337
#338
#341
#342
#343
#344
#346
#347
#348
#349
#35
#351
#352
#355
#356
#36
#360
#364
#366
#368
#369
#37
#370
#371
#374
#375
#377
#38
#382
#386
#39
#390
#391
#392
#394
#4
#40
#400
#402
#404
#405
#406
#408
#409
#41
#410
#411
#413
#414
#415
#416
#417
#418
#42
#421
#422
#424
#426
#427
#428
#429
#43
#430
#431
#435
#438
#439
#44
#441
#442
#443
#444
#445
#446
#448
#449
#45
#453
#454
#457
#458
#46
#460
#461
#462
#465
#468
#469
#47
#470
#471
#473
#475
#478
#48
#480
#481
#482
#483
#484
#486
#487
#488
#489
#49
#492
#493
#494
#495
#496
#497
#5
#50
#501
#502
#504
#505
#506
#508
#509
#51
#510
#511
#512
#513
#515
#516
#517
#518
#52
#520
#524
#525
#528
#529
#53
#531
#533
#534
#535
#536
#537
#54
#540
#541
#542
#544
#546
#547
#549
#55
#550
#552
#553
#554
#554
#555
#557
#558
#559
#56
#560
#563
#565
#566
#567
#569
#57
#570
#571
#573
#574
#577
#578
#579
#58
#580
#581
#582
#583
#584
#585
#587
#588
#589
#592
#593
#595
#598
#6
#602
#603
#604
#606
#607
#608
#609
#61
#610
#611
#612
#616
#617
#618
#62
#620
#622
#624
#628
#630
#631
#637
#639
#64
#640
#642
#643
#645
#648
#65
#652
#653
#653
#654
#66
#661
#662
#668
#670
#672
#674
#676
#677
#68
#680
#682
#683
#684
#688
#689
#69
#692
#695
#696
#698
#699
#7
#70
#700
#701
#702
#705
#707
#708
#709
#71
#710
#711
#712
#713
#714
#716
#717
#719
#72
#720
#722
#723
#724
#726
#727
#728
#73
#734
#735
#738
#739
#74
#740
#741
#742
#745
#748
#75
#751
#752
#754
#757
#759
#76
#760
#762
#764
#768
#77
#771
#774
#78
#782
#786
#787
#788
#789
#79
#790
#791
#794
#795
#796
#797
#798
#799
#80
#801
#802
#803
#807
#81
#810
#814
#817
#820
#823
#824
#825
#826
#828
#829
#83
#833
#835
#836
#837
#838
#84
#840
#841
#842
#843
#844
#845
#85
#850
#851
#852
#853
#855
#857
#858
#86
#860
#861
#863
#864
#866
#868
#87
#870
#871
#872
#874
#875
#876
#878
#879
#880
#881
#882
#883
#887
#889
#89
#891
#892
#894
#896
#897
#899
#9
#90
#900
#901
#902
#903
#906
#907
#91
#910
#911
#913
#914
#916
#920
#921
#922
#923
#924
#926
#928
#929
#93
#931
#932
#933
#934
#935
#936
#937
#938
#939
#94
#941
#943
#944
#945
#947
#948
#949
#951
#952
#954
#957
#958
#959
#96
#963
#965
#966
#968
#969
#97
#970
#971
#972
#973
#976
#977
#98
#980
#983
#984
#985
#987
#988
#989
#99
#991
#992
#993
#995
#996
#998
#999
t0002
-
6b9de3dbaa
Fix mrope application across chunk boundaries (Fixes #993 and #1902 -- part 2) (#1918)
main
Farmadupe
2026-06-05 16:10:02 +01:00 -
c3b975eb04
CPU FA: Check for empty attention mask
ik/check_for_empty_mask
Kawrakow
2026-06-05 09:21:15 +00:00 -
1b53a58bf9
Enable split mode graph for Gemma4-12B (#1922)
Kawrakow
2026-06-05 10:59:22 +02:00 -
68a94ab930
Enable split mode graph for Gemma4-12B
ik/gemma4_12B_smgraph
Kawrakow
2026-06-04 16:12:53 +00:00 -
1520eda980
prompt cache: Fix assertion that prompt cache does ot rewind to middle of image (#1913)
Farmadupe
2026-06-04 16:53:06 +01:00 -
19dcc1f7d1
CUDA : support head_dim 512 with gqa_ratio % 8 (unblocks Gemma 4 12B) (#1921)
Chip Bradford
2026-06-04 08:36:10 -07:00 -
007d640098
Standardize speculative decoding arguments on the server (#1908)
Samuel Oliveira Alves
2026-06-04 10:44:57 -03:00 -
6c0180d702
server: enable mcp proxy (#1904)
firecoperana
2026-06-04 08:43:07 -05:00 -
074fc7dafd
webui: update llamacpp webui (#1903)
firecoperana
2026-06-04 08:41:23 -05:00 -
4406e637b5
Split mode graph for Mellum (#1920)
Kawrakow
2026-06-04 15:20:41 +02:00 -
0ad43359a4
Split mode graph for Mellum
ik/mellum_sm_graph
Kawrakow
2026-06-04 13:13:33 +00:00 -
976704d509
webui: update llamacpp webui
fcp/webui_update2
firecoperana
2026-05-27 16:56:36 -05:00 -
dc51c6f9b2
Add Mellum2 architecture support (#1919)
Joel Farthing
2026-06-04 07:28:02 -05:00 -
e08ad51f15
Insert image pad markers for kimi K2.5 and K2.6 (#1912)
Farmadupe
2026-06-04 08:27:28 +01:00 -
167386bc26
Add cors proxy
fcp/cors-proxy
firecoperana
2026-05-30 09:24:29 -05:00 -
e1b889337e
update http lib
firecoperana
2026-05-30 09:20:04 -05:00 -
3f40e73c36
expand np guardrail for all mtp types (#1901)
Samuel Oliveira Alves
2026-05-30 10:19:53 -03:00 -
8960c5ba5e
Add extra nodes when dealing with MLA and amb (#1899)
Kawrakow
2026-05-29 15:17:24 +03:00 -
adeff7dbd3
Add extra nodes when dealing with MLA and amb
ik/mla_add_extra_nodes
Kawrakow
2026-05-29 08:38:24 +00:00 -
e75337fec3
quantize: add exception for Gemma4 (#1897)
Kawrakow
2026-05-29 10:54:21 +03:00 -
ccc48d33c7
quantize: add exception for Gemma4
ik/quantize_gemma4
Kawrakow
2026-05-29 06:10:15 +00:00 -
20c2f6d97f
Add lower bound to the
-ambcommand line argument ik/limit_amb
Kawrakow
2026-05-29 05:22:20 +00:00 -
6eff055a0c
GLM-5 MTP (again) (#1890)
Kawrakow
2026-05-28 18:14:12 +03:00 -
6648aa2e6e
Fix Gemma4 vision
ik/gemma4_mtmd_blindness
Kawrakow
2026-05-28 15:08:46 +00:00 -
a18eeb01cb
Qwen3.5 MTP: extract selected tokens earlier
ik/qwen_mtp_inp_out_ids
Kawrakow
2026-05-28 11:50:51 +00:00 -
7cf668f797
Make MTP work with split mode graph
ik/glm5_mtp
Kawrakow
2026-05-28 04:12:34 +00:00 -
3bf7e836c2
Allow Hadamard transform for head sizes that are not power of 2 (#1883)
Kawrakow
2026-05-27 18:29:32 +03:00 -
5a10d701f9
Arghh
ik/hadamard_block_size
Kawrakow
2026-05-27 13:49:58 +00:00 -
de4257303f
GLM5 MTP: just reuse the layer attention implementation
Kawrakow
2026-05-27 06:51:24 +00:00 -
e84e146f68
Merge remote-tracking branch 'iwan/main' into ik/glm5_mtp
Kawrakow
2026-05-27 05:54:11 +00:00 -
0158d384c0
Fix GLM5 MTP
Kawrakow
2026-05-27 05:49:36 +00:00 -
0cf0dc76ef
Give more details why Hadamard is not possible
Kawrakow
2026-05-27 05:01:22 +00:00 -
e904efed72
Allow Hadamard transform for head sizes that are not power of 2
Kawrakow
2026-05-26 10:00:39 +00:00 -
d0e29862fc
Disable K Hadamard transform if K-head size is not a power of 2
Kawrakow
2026-05-26 07:19:08 +00:00 -
d503b046f7
Fix GLM MTP with split mode graph (#1887)
Kawrakow
2026-05-27 07:24:28 +03:00 -
1f66f9912f
Fix crash with GLM and MTP (#1885)
Kawrakow
2026-05-27 07:24:05 +03:00 -
68d818269e
Fix GLM MTP with split mode graph
ik/fix_glm_mtp_smgraph
Kawrakow
2026-05-26 16:30:12 +00:00 -
d2da6da05c
Fix cache loading/saving for MLA models and split mode graph (#1884)
Kawrakow
2026-05-26 17:07:40 +03:00 -
5425749950
Fix crash with GLM and MTP
ik/fix_glm_mtp_accept
Kawrakow
2026-05-26 14:02:41 +00:00 -
4fbd0c441b
fa: preserve early-termination, fix multi-slot correctness via union of masks (#1880)
Gearstickle
2026-05-26 06:16:49 -07:00 -
e5732606c5
Fix cache loading/saving for MLA models and split mode graph
ik/fix_mla_smgraph_cache_load_save
Kawrakow
2026-05-26 12:37:12 +00:00 -
f3e929c25e
Disable K Hadamard transform if K-head size is not a power of 2
ik/disable_khadamard_if_not_power2
Kawrakow
2026-05-26 07:19:08 +00:00 -
b4e1d916c5
Per GPU fit margin (#1872)
Kawrakow
2026-05-25 08:16:45 +03:00 -
9f7ba245ab
Update autofix and presets (#1867)
Samuel Oliveira Alves
2026-05-24 01:30:44 -03:00 -
0c45696db4
Minor logging cleanup (#1873)
Kawrakow
2026-05-24 07:29:32 +03:00 -
809a63bbb7
Fix MLA models with ngl < n_layer (#1870)
Kawrakow
2026-05-24 07:29:17 +03:00 -
4646800a89
Merge branch 'main' into feat/glm5-mtp
Kawrakow
2026-05-23 19:22:34 +03:00 -
642c038ccd
Extend expiring logit bias to other sampling parameters (#1770)
dungquixote42
2026-05-23 12:19:12 -04:00 -
c7211cc500
Minor logging cleanup
ik/logging_cleanup
Kawrakow
2026-05-23 16:05:54 +00:00 -
e5abe3a86a
Per GPU fit margin
ik/per_gpu_fit_margin
Kawrakow
2026-05-23 15:24:19 +00:00 -
40d8cb196a
llama-quantize: enable --extra-output-tensor with COPY (#1871)
Justin Martin
2026-05-23 10:52:34 +00:00 -
d065b9f742
It is actually not related to split mode graph
ik/fix_partial_ngl_smgraph_mla
Kawrakow
2026-05-23 10:47:57 +00:00 -
2c893906e2
Fix split mode graph with ngl < n_layer (MLA models)
Kawrakow
2026-05-23 10:28:46 +00:00 -
a6bb509305
Fix split mode graph with ngl < n_layer (#1869)
Kawrakow
2026-05-23 12:58:09 +03:00 -
516dfb39f3
Fix split mode graph with ngl < n_layer
ik/fix_partial_ngl_smgraph
Kawrakow
2026-05-23 06:43:52 +00:00 -
3f45ba9387
MTP tweaks 3 (#1862)
Kawrakow
2026-05-23 07:23:20 +03:00 -
19e09e81d4
Change MTP graph input preparation with additional parameters and validation checks (#1866)
Samuel Oliveira Alves
2026-05-23 01:22:04 -03:00 -
8bf4e6ca50
MTP tweaks 3
ik/mtp_accept_only_last_logits
Kawrakow
2026-05-22 09:37:36 +00:00 -
b3d39cff8b
Fix split mode graph for Qwen35-MoE + MTP (#1861)
Kawrakow
2026-05-22 09:23:53 +03:00 -
fa1f302d77
Fix split mode graph for Qwen35-MoE + MTP
ik/fix_q35moe_mtp_smgraph
Kawrakow
2026-05-22 06:11:14 +00:00 -
b26521b9ef
Fix raw-vs-local device id confusion under -dev/-devd subsets (#1826)
thad0ctor
2026-05-21 22:32:52 -07:00 -
d51036a0c4
fix: reset KV cache and prompt state in server_slot and server_context (#1860)
Samuel Oliveira Alves
2026-05-22 02:14:47 -03:00 -
48a55f74e4
Disable split mode graph for Qwen35-MoE when MTP is enabled (#1858)
Kawrakow
2026-05-21 16:29:35 +03:00 -
0bcfde9518
Disable split mode graph for Qwen35-MoE when MTP is enabled
ik/disable_smgraph_qwen35moe_mtp
Kawrakow
2026-05-21 13:24:37 +00:00 -
4b73de246b
Fix crash with split mode graph and partial offload (#1857)
Kawrakow
2026-05-21 13:36:01 +03:00 -
dd123f9f4f
Fix crash with split mode graph and partial offload
ik/fix_partial_offload_crash
Kawrakow
2026-05-21 10:31:51 +00:00 -
c5dc847d0a
Fix Gemma4-E4B compute graph (#1855)
Kawrakow
2026-05-21 12:46:28 +03:00 -
f1e146859b
Fix Gemma4-E4B compute graph
ik/fix_gemma_e4b
Kawrakow
2026-05-21 08:19:30 +00:00 -
3dd282358b
Fix compiler warnings
Kawrakow
2026-05-21 05:40:08 +00:00 -
7b73f45541
Add adaptive sampling clone and free functions to manage memory (#1851)
Samuel Oliveira Alves
2026-05-21 02:11:17 -03:00 -
aefb8bdd99
MLA TP -khad: ggml_dequant_hadamard fused op + wv_b/wk_b_pp Hadamard fold (#1852)
David Young
2026-05-21 05:29:15 +01:00 -
11a1fea9e2
Move embedding management to speculative (#1825)
Samuel Oliveira Alves
2026-05-20 11:42:48 -03:00 -
dd67a9fb24
MLA TP prompt processing optimisation (#1841)
David Young
2026-05-20 15:03:05 +01:00 -
40254a51da
Fix MTP when -no-gr is used (#1848)
Kawrakow
2026-05-20 13:38:33 +03:00 -
a3d46a963a
Fix MTP when -no-gr is used
ik/fix_mtp_no_gr
Kawrakow
2026-05-20 10:36:48 +00:00 -
c2c12c987d
Fix mla = 1 / mla = 3 confusion
ik/fix_mla1
Kawrakow
2026-05-20 07:02:00 +00:00 -
eb597df91f
Upate AUTHORS
Kawrakow
2026-05-20 06:38:39 +00:00 -
290935be79
Remove Makefile (#1847)
Kawrakow
2026-05-20 09:14:28 +03:00 -
8b9db5efcb
Remove Makefile
ik/rm_Makefile
Kawrakow
2026-05-20 06:12:59 +00:00 -
6bb3ee3a32
Enable split mode graph for MLA models and partial offload (#1835)
Kawrakow
2026-05-20 07:13:55 +03:00 -
9ae0fb7b2f
Remove reasoning budget logs (#1846)
firecoperana
2026-05-19 23:12:02 -05:00 -
77413bc900
Add Hadamard parameters to draft model loading (#1840)
Samuel Oliveira Alves
2026-05-19 12:30:41 -03:00 -
9326ae5f37
MLA TP prompt processing optimisation
David Young
2026-05-19 16:04:30 +01:00 -
997c587a6c
Fix #1837 (#1838)
Kawrakow
2026-05-19 17:56:21 +03:00 -
301f8d9afd
Fix #1837
ik/fix_dst_backend
Kawrakow
2026-05-19 14:54:13 +00:00 -
2575143637
Enable split mode graph for MLA models and partial offload
ik/enable_smgraph_mla_hybrid
Kawrakow
2026-05-19 12:34:03 +00:00 -
27d7a74389
Compiler warnings
Kawrakow
2026-05-19 05:51:27 +00:00 -
9ad8b8c6db
common/grammar: fix grammar parsing issues to prevent stack overflow and hangs (#1822)
firecoperana
2026-05-19 00:36:49 -05:00 -
c07a052315
MLA tensor parallelism under -sm graph (DEEPSEEK2/GLM_DSA/MISTRAL4) (#1821)
David Young
2026-05-19 06:36:17 +01:00 -
104846ddee
spec : disacard last drafted token with low prob (#1820)
firecoperana
2026-05-19 00:35:35 -05:00 -
f43a9f1cf6
Add per-byte CUDA MoE offload threshold (#1813)
Joel Farthing
2026-05-19 00:35:05 -05:00 -
f645ed1e2d
AutoParser: improve reasoning budget and handling of space/newline in tool calls (#1819)
firecoperana
2026-05-19 00:34:19 -05:00 -
7dd19e197d
Some tweaks
ik/mla_smgraph
Kawrakow
2026-05-18 15:52:26 +00:00 -
40aae0b6d8
Check for output_extra.weight when loading Gemma4 assistant models (#1817)
Kawrakow
2026-05-18 08:17:05 +03:00 -
a407b9ca3d
Fix Qwen3.6-MoE low MTP acceptance rate (#1815)
Kawrakow
2026-05-18 07:26:17 +03:00 -
ee51a7aef4
get normed embeddings for glm 5
SamuelOliveirads
2026-05-17 21:27:05 -03:00 -
d2ccbe92a6
MLA tensor parallelism under -sm graph (DEEPSEEK2/GLM_DSA/MISTRAL4)
David Young
2026-05-17 19:14:49 +01:00 -
c35189d83c
fix(server): reset chat parser on slot reuse to prevent crash (#1763) (#1794)
gapeleon
2026-05-18 01:26:45 +10:00 -
544fc08db2
Check for output_extra.weight when loading Gemma4 assistant models
ik/gemma4_mtp_extra_output
Kawrakow
2026-05-17 14:24:04 +00:00 -
d237d7b398
Fix Gemma4 MTP
ik/fix_qwen35moe_low_mtp_acceptance
Kawrakow
2026-05-17 13:04:00 +00:00