From eaa2510a28b60d43c2210c69cefdf750d5cc119f Mon Sep 17 00:00:00 2001 From: Thomas <119688458+ThomasBaruzier@users.noreply.github.com> Date: Wed, 23 Jul 2025 13:31:53 +0200 Subject: [PATCH] Add GitHub data: filename sanitization (#640) --- ...0 - New argument _ env variable for GGML_SCHED_MAX_COPIES_.md} | 0 ...ze.md => 104 - Convenience improvements for llama-quantize.md} | 0 ...ions about weight[j].md => 140 - Questions about weight_j_.md} | 0 ...nd i-quants_.md => 15 - Will LQER improve k- and i-quants_.md} | 0 ... => 164 - Latest CPU performance comparison with llama.cpp.md} | 0 .../{165-Norm RMS Epsilon.md => 165 - Norm RMS Epsilon.md} | 0 ...LM quantization.md => 166 - Learning more LLM quantization.md} | 0 ...speed.md => 18 - CPU beating GPU in token generation speed.md} | 0 ... NUMA situation _.md => 201 - What is the NUMA situation _.md} | 0 ...mer.md => 211 - help me create an importance matrix primer.md} | 0 ...R1.md => 223 - Recent performance testing with DeepSeek R1.md} | 0 ...hing from llama.cpp_ktransformers_ seeking advice_guidance.md} | 0 ....md => 25 - CPU prompt processing speed for large contexts.md} | 0 ...erging from llama.cpp.md => 256 - Diverging from llama.cpp.md} | 0 ...-start Guide coming over from llama.cpp and ktransformers_.md} | 0 ...1 - 16x3090.md => 266 - Benchmarking DeepSeek R1 - 16x3090.md} | 0 ...86 - Testing _deepseek-ai_DeepSeek-V3-0324_ model support..md} | 0 ... _compilade_s PR 12557 and _jukofyork_s quantization ideas.md} | 0 ...d => 316 - Mainline is now copying stuff from ik_llama.cpp.md} | 0 ...k_llama.cpp.md => 319 - KTransformers copying ik_llama.cpp.md} | 0 ...asy way to repack an existing GGUF so it could be used wit.md} | 0 ..._ks_ performs great on gemma-3-27b-it-qat-q4_0-unquantized.md} | 0 ... prompt with gpu.md => 350 - Maverick slow prompt with gpu.md} | 0 ...LAs are born equal.md => 354 - Not all MLAs are born equal.md} | 0 ...parisons.md => 357 - Qwen3 - early performance comparisons.md} | 0 ...ion experiments.md => 359 - Qwen3 quantization experiments.md} | 0 github-data/discussions/{372-multy gpu.md => 372 - multy gpu.md} | 0 ...tion.md => 384 - ik_llama.cpp issues on an old workstation.md} | 0 ... - Qwen3 235B performance on Intel Xeon Scalable processor.md} | 0 ...ing quantized models.md => 393 - Creating quantized models.md} | 0 ....md => 395 - Why does imatrix not tokenize special tokens_.md} | 0 ...est settings for Maverick - Dual CPU Xeon 8480_ - RTX 3090.md} | 0 ...using `-sm row`.md => 397 - KV split while using _-sm row_.md} | 0 ... 399 - Qwen 30b.A3b IK_LCPP comparisons on lowspec machine.md} | 0 ...all bitnet _or other cpu models_ on a fresh termux aarch64.md} | 0 ...- Tool Calling and Structured Response _Json Mode_ support.md} | 0 ... Cookers Basic Guide.md => 434 - Quant Cookers Basic Guide.md} | 0 ...md => 451 - Context reuse _ context shift for long prompts.md} | 0 ...9 - qwen3 metrics on ancient hardware _2x xeon Vs 2x P100_.md} | 0 .../discussions/{466-A curiosity..md => 466 - A curiosity..md} | 0 ...R1-0528 ik quants!.md => 477 - DeepSeek-R1-0528 ik quants_.md} | 0 ... => 491 - -rtr actually hurts prompt t_s for large ubatch_.md} | 0 .../discussions/{519-Android Build.md => 519 - Android Build.md} | 0 ...ial requant feature to save compute and time during tests..md} | 0 ...PU Layer Offloading Strategy in ik_llama.cpp for Multi GPU.md} | 0 ...upport and thanks.md => 543 - dots.llm1 support and thanks.md} | 0 .../{545-Vulkan support_.md => 545 - Vulkan support_.md} | 0 ...=> 548 - Poor performance with bf16 model on Qwen3 30B-A3B.md} | 0 ...llama.cpp for Armv8.0.md => 556 - ik_llama.cpp for Armv8.0.md} | 0 ...iscussion.md => 562 - AMD GPU Vulkan _ ROCm_HIP Discussion.md} | 0 ...DA PR here..md => 564 - Maybe an interesting CUDA PR here..md} | 0 ... cache rm operation.md => 586 - Slow KV cache rm operation.md} | 0 ....md => 590 - How important is Vulkan back-end development_.md} | 0 ...y speed improvement in generation_ so want to understand i.md} | 0 ...ent on x64_.md => 594 - Is AVX2 a hard requirement on x64_.md} | 0 ...99-mla matrix absorbtion.md => 599 - mla matrix absorbtion.md} | 0 ...logical Quant_CUDA combinations -- How to know what works_.md} | 0 .../{619-gpu p2p utilization.md => 619 - gpu p2p utilization.md} | 0 ...isoned prompt_.md => 621 - Deepseek v3_r1 poisoned prompt_.md} | 0 ..._.md => 623 - Quantizing panels_bundles instead of blocks_.md} | 0 ...on evaluation.md => 63 - LLaMA-3.2 quantization evaluation.md} | 0 ...d => 8 - New quantization types IQ2_K_ IQ3_K_ IQ4_K_ IQ5_K.md} | 0 .../{82-4bpw GGML TYPE_.md => 82 - 4bpw GGML TYPE_.md} | 0 github-data/discussions/{95-Bitnet.md => 95 - Bitnet.md} | 0 ...ug_ K cache without FA.md => 103 - Bug_ K cache without FA.md} | 0 ...e ggml library_.md => 133 - Refactor_ update ggml library_.md} | 0 ...st_ steps how to compile as cmake i struction on the origi.md} | 0 ...e on MSVC 2022.md => 160 - Bug_ Can_t compile on MSVC 2022.md} | 0 ...167 - Bug_ Unable to quantize Falcon 10B 1.58 bitnet model.md} | 0 ...83-Refactor_ iqk_mul_mat.md => 183 - Refactor_ iqk_mul_mat.md} | 0 ...Refactor_ remove usage of Q8_1 for activation quantization.md} | 0 ... system_prompt on llama-server at runtime breaks parallel .md} | 0 ... Bug_ Compliation Error for Intel_R_ Xeon_R_ Gold 6326 CPU.md} | 0 ...s the iqk_mul_mat.cpp support 1.58-bit quantization model_.md} | 0 .../{214-AVX512 build error.md => 214 - AVX512 build error.md} | 0 ...broken.md => 217 - Bug_ CPU FA with fp16 K-cache is broken.md} | 0 ... => 224 - Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md} | 0 ...ent FA usage on CUDA when K and V head sizes are different.md} | 0 ...28 - Feature Request_ create tool to offline repack models.md} | 0 ...cking.md => 230 - Weird assert when using online repacking.md} | 0 ...45 - Bug_ Perplexity returns NaN with IQ4_KSS quantisation.md} | 0 ...=> 249 - CUDA_ results for MoE models are not reproducible.md} | 0 .../issues/{254-Split-mode row.md => 254 - Split-mode row.md} | 0 ...st_ dynamic layer by layer offloading during prompt proces.md} | 0 ... - Bug_ mla_2 in llama-server will crash when request done.md} | 0 ...e Request_ Improve CPU processing speed for large contexts.md} | 0 ...1 - 16x3090.md => 263 - Benchmarking DeepSeek R1 - 16x3090.md} | 0 ... - Feature Request_ HugePage mmap alloc for DeepSeek V3_R1.md} | 0 ...gression computing _wk_b_ tensors on the fly after PR _265.md} | 0 ...erformance.md => 281 - Bug_ Strange dips in TG performance.md} | 0 ...5 - llama-perplexity giving all NaNs on unsloth Q8_0 quant.md} | 0 ... Bug_ some ifdefs missing in ggml_src_iqk_iqk_quantize.cpp.md} | 0 ...t.md => 293 - Feature Request_ IQ6_K row interleaved quant.md} | 0 ...rical stability issue with experimental quant of DeepSeek-.md} | 0 ...> 297 - Update gguf-py scripts to support new quant types..md} | 0 ...> 30 - Bug_ Appcrash on Windows 7 with GGML_USE_IQK_MULMAT.md} | 0 ... => 300 - Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md} | 0 ...put when using DeepSeek-V3-0324-IQ2_K_R4 on mixed CPU _ 4 .md} | 0 ..._.md => 306 - Confused by the -mla flag. What_s supported_.md} | 0 ...Compiling for arm64_ error_ cannot convert _const uint32x4_t_ to _.md} | 0 .../issues/{314-Llama 4 Support_.md => 314 - Llama 4 Support_.md} | 0 ... decoding support.md => 322 - Speculative decoding support.md} | 0 ...generates garbage with longer context _64K_ the issue is n.md} | 0 ...late issues.md => 339 - Bug_ bitnet2b_2501 template issues.md} | 0 ...when processing prompt lengths that are not a multiple of .md} | 0 ... model architecture_ _cohere2_ when trying to load Command.md} | 0 ...45-build question newbie.md => 345 - build question newbie.md} | 0 ... for Windows _.md => 353 - Binaries releases for Windows _.md} | 0 ... => 358 - Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md} | 0 ... => 361 - Bug_ Build not detecting some supported ARM CPUs.md} | 0 ... README language is vague wrt. _quantization improvements_.md} | 0 ...h output when using flash attention using Mistral-Small-I.md} | 0 ...et-b1.58.md => 365 - Bug_ Updated BitNet arch bitnet-b1.58.md} | 0 ...=> 367 - Bug_ IQ1_S_R4_ IQ1_M_R4 failed on Qwen3-235B-A22B.md} | 0 ...24 can_t load newest UD quants _with MLA_. Older quant wor.md} | 0 ...model architecture_ _deci_ _when loading Llama-3_1-Nemotro.md} | 0 ... - Feature Request_ Use ik_llama.cpp with llama-cpp-python.md} | 0 ..._ Cannot build on WoA.md => 379 - Bug_ Cannot build on WoA.md} | 0 ... of generation.md => 380 - Drop at the start of generation.md} | 0 ...p_ggml_src_ggml-cuda_fattn.cu_66_ fatal error after latest.md} | 0 ...DeepSeek R1T Chimera causes _llama_model_load_ error loadi.md} | 0 ....md => 387 - Bug_ bitnet 1.58 on termux segmentation fault.md} | 0 ...s.md => 388 - Bug_ Clash with mainline llama.cpp .so files.md} | 0 ...389 - Bug_ llama-batched-bench crashed with batch size _2.md} | 0 ...ccess.md => 398 - Bug_ -fmoe causing illegal memory access.md} | 0 ...ture Request_ Support for function calling in llama-server.md} | 0 ...pile..md => 412 - Bug_ Static asserts trip during compile..md} | 0 ...100).md => 419 - qwen3 metrics in expert parallel_2x P100_.md} | 0 ...on is broken.md => 420 - Bug_ standard attention is broken.md} | 0 ...ompile failure undefined reference to _void mul_mat_q_case.md} | 0 ... Bug_ CUDA error_ an illegal memory access was encountered.md} | 0 ... Refactor_ GGUF v14 broke compatibility with IQx_KS quants.md} | 0 ...st_ CORS support.md => 433 - Feature Request_ CORS support.md} | 0 ...t.md => 436 - Bug_ Saving the prompt cache causes Segfault.md} | 0 ... Feature Request_ support intel amx for further accelerate.md} | 0 ...a sampler.md => 440 - Feature Request_ Top n-sigma sampler.md} | 0 ...or_ Error C2676.md => 447 - Compilation Error_ Error C2676.md} | 0 ...ormance regression.md => 450 - Bug_ Performance regression.md} | 0 .../{452-Falcon H1 Support.md => 452 - Falcon H1 Support.md} | 0 ...e is never reused in OpenAI compatible Chat Completion api.md} | 0 ..._MULMAT.md => 456 - Bug_ no compilation without IQK_MULMAT.md} | 0 ....md => 463 - Research_ V100 Flash Attention Implementation.md} | 0 ... - Bug_ The streaming every couple of rows blocks for 5-8s.md} | 0 ...oes not send data_ _DONE_ for OpenAI-compatible streaming .md} | 0 ... Bug_ Don_t build ggml-aarch64 regardless of CPU arch type.md} | 0 ...ression in PP throughput after Pull _461 _...R4 CUDA impl_.md} | 0 ...ce divergence.md => 476 - Research_ performance divergence.md} | 0 ...ckend_cuda_graph_compute_ disabling CUDA graphs due to GPU.md} | 0 ...=> 485 - Bug_ Illegal Memory Access loading model to CUDA1.md} | 0 ... #461.md => 490 - Bug_ Performance drop with 14292913 _461.md} | 0 ...uantize method.md => 498 - question_ about quantize method.md} | 0 ...=> 499 - Bug_ cache quantization crash with IQK_FORCE_BF16.md} | 0 ...> 500 - Bug_ Insane cudaMalloc OOM Error on Dual 3090 GPUs.md} | 0 ....md => 503 - Bug_ server_cli fails with segmentation fault.md} | 0 ...patible gguf models _.md => 507 - Compatible gguf models _.md} | 0 ...rror on RTX 5090 _Compute Capability 12.0_ _no kernel imag.md} | 0 ...ng semi layers to some GPUs with -ot_ TG t_s performance t.md} | 0 ...d.md => 522 - Bug_ disabling CUDA graphs due to mul_mat_id.md} | 0 ... drop after https_github.com_ikawrakow_ik_llama.cpp_pull_5.md} | 0 ... Webui improvement _481 core dump with a certain question..md} | 0 ...second prompt..md => 530 - Getting crash on second prompt..md} | 0 ...prompt.md => 538 - Bug_ GGML_ASSERT failed at first prompt.md} | 0 .../{539-Bug_ garbage output.md => 539 - Bug_ garbage output.md} | 0 ...d => 551 - Feature Request_ Support for Falcon Edge series.md} | 0 ... 561 - Feature Request_ Tencent Hunyuan-A13B model support.md} | 0 ...pport.md => 568 - Feature Request_ ERNIE MoE Model Support.md} | 0 ...l_compute_forward_sum_rows_f32_ ffn_moe_weights_sum-60_ fo.md} | 0 ...er.md => 575 - Bug_ llama-server crash with sampling order.md} | 0 ...ama-server crash with _Deepseek2 does not support K-shift_.md} | 0 ...L Compilation Error_ undefined references to _iqk_mul_mat_.md} | 0 ...commit broke llama-cli on Windows - mmq.cuh_107_ fatal err.md} | 0 ...7 - Feature Request_ Add THUDM_GLM-4-MoE-100B-A10B support.md} | 0 ...4.md => 60 - Bug_ Illegal instruction on NEON and Q4_0_4_4.md} | 0 ...st_ Port --reasoning-budget from main llamacpp _llamaserve.md} | 0 ...a-imatrix crashing.md => 601 - Bug_ llama-imatrix crashing.md} | 0 ...issing from GGMLQuantizationType - gguf_reader.py script c.md} | 0 ...ffload.md => 614 - Feature Request_ port no-mmproj-offload.md} | 0 ...ion not working.md => 615 - Bug_ Gemma3 Vision not working.md} | 0 ... Bug_ undefined symbol errors after successful compilation.md} | 0 ... IQ1_M.md => 626 - Feature Request_ Add IQK GEMM for IQ1_M.md} | 0 ...arallelism.md => 627 - Feature Request_ Tensor Parallelism.md} | 0 ...rformance _Windows_ is significantly worse than single-GPU.md} | 0 ...67 - Feature Request_ Elliminate_reduce unnecessary copies.md} | 0 ...on't compile on MSVC.md => 88 - Bug_ Won_t compile on MSVC.md} | 0 ... KV cache produces garbage in situation where llama.cpp do.md} | 0 ...e GPU.md => 1 - Offload Bitnet token embeddings to the GPU.md} | 0 ...2.md => 10 - iq4_k_ speedup quantization by a factor of _2.md} | 0 ...flash attention.md => 101 - Enable q6_0 in flash attention.md} | 0 ....md => 102 - Add support for Granite and GraniteMoE models.md} | 0 ...he without FA.md => 105 - Fix quantized k-cache without FA.md} | 0 .../{106-Bitnet changes.md => 106 - Bitnet changes.md} | 0 ...lementation.md => 107 - Faster IQ1_BN Metal implementation.md} | 0 ...d => 108 - Another Bitnet performance improvement on Metal.md} | 0 ...net CUDA improvements.md => 109 - Bitnet CUDA improvements.md} | 0 ...uantization.md => 11 - Faster iq3_k and iq5_k quantization.md} | 0 ...=> 110 - Bitnet_ use the fused mul-silu in the FFN network.md} | 0 ...s.md => 111 - Use fused mul - unary op also for MoE models.md} | 0 ...{112-Faster MoE inference.md => 112 - Faster MoE inference.md} | 0 ...{113-Trellis quantization.md => 113 - Trellis quantization.md} | 0 ...y please!).md => 114 - MMQ Kernel for Q6_0 _pretty please_.md} | 0 .../pull_requests/{115-MMQ for Q6_0.md => 115 - MMQ for Q6_0.md} | 0 ...0 instead of Q5_1 for tensors incompatible with IQ5_K_Q5_K.md} | 0 ...gies tweaks.md => 117 - Some minor quant strategies tweaks.md} | 0 .../pull_requests/{118-IQ4_NL_X4.md => 118 - IQ4_NL_X4.md} | 0 github-data/pull_requests/{119-Q4_0_R4.md => 119 - Q4_0_R4.md} | 0 ..._ allow it to detect ternary nets and quantize accordingly.md} | 0 github-data/pull_requests/{120-Q8_0_R4.md => 120 - Q8_0_R4.md} | 0 github-data/pull_requests/{121-Q5_0_R4.md => 121 - Q5_0_R4.md} | 0 github-data/pull_requests/{122-Q6_0_R4.md => 122 - Q6_0_R4.md} | 0 .../pull_requests/{123-IQ4_XS_R4.md => 123 - IQ4_XS_R4.md} | 0 ...iq2_bn_r4_ fastest Bitnet CPU implementation on the planet.md} | 0 ...ements on ARM_NEON.md => 125 - R4 improvements on ARM_NEON.md} | 0 ..._x4 to iq4_nl_r4.md => 126 - Rename iq4_nl_x4 to iq4_nl_r4.md} | 0 .../{127-Q4_0_R4 on CUDA.md => 127 - Q4_0_R4 on CUDA.md} | 0 ...ter IQ4_XS_R4 on Zen4.md => 128 - Faster IQ4_XS_R4 on Zen4.md} | 0 github-data/pull_requests/{129-Q4_K_R4.md => 129 - Q4_K_R4.md} | 0 ...odels.md => 13 - Adding IQ2_TN for use with ternary models.md} | 0 github-data/pull_requests/{130-Q6_K_R4.md => 130 - Q6_K_R4.md} | 0 ....md => 131 - Slightly faster Q4_K_R4 and IQ4_XS_R4 on Zen4.md} | 0 github-data/pull_requests/{132-Q5_K_R4.md => 132 - Q5_K_R4.md} | 0 github-data/pull_requests/{134-Q3_K_R4.md => 134 - Q3_K_R4.md} | 0 ...s.md => 135 - Better ARM_NEON implementation for R4 quants.md} | 0 github-data/pull_requests/{136-Q2_K_R4.md => 136 - Q2_K_R4.md} | 0 ...iq4_nl_r4.md => 137 - Fix AVX2 implementation of iq4_nl_r4.md} | 0 github-data/pull_requests/{138-IQ4_K_R4.md => 138 - IQ4_K_R4.md} | 0 ...ter R4 quants on Zen4.md => 139 - Faster R4 quants on Zen4.md} | 0 .../pull_requests/{14-Adding IQ6_K.md => 14 - Adding IQ6_K.md} | 0 ...=> 141 - Q8_K_R8_ Fastest quantized matrix multiplications.md} | 0 ...f16 rows .md => 142 - BF16_R16 - 16 interleaved bf16 rows.md} | 0 ...S_R4 on AVX2.md => 143 - Slightly faster IQ4_XS_R4 on AVX2.md} | 0 ...VX2_Zen4.md => 144 - Slightly faster IQ4_K_R4 on AVX2_Zen4.md} | 0 github-data/pull_requests/{145-IQ3_K_R4.md => 145 - IQ3_K_R4.md} | 0 github-data/pull_requests/{146-IQ2_K_R4.md => 146 - IQ2_K_R4.md} | 0 ...run time.md => 147 - Be able to repack tensors at run time.md} | 0 ...er matrix x vector on Zen4_AVX2 for iq2_k_r4_ iq3_k_r4_ iq.md} | 0 github-data/pull_requests/{149-IQ5_K_R4.md => 149 - IQ5_K_R4.md} | 0 .../pull_requests/{150-IQ4_KS_R4.md => 150 - IQ4_KS_R4.md} | 0 github-data/pull_requests/{151-fix typo.md => 151 - fix typo.md} | 0 .../pull_requests/{152-IQ3_XXS_R4.md => 152 - IQ3_XXS_R4.md} | 0 .../pull_requests/{153-IQ3_XXS_R4.md => 153 - IQ3_XXS_R4.md} | 0 .../pull_requests/{154-IQ2_XXS_R4.md => 154 - IQ2_XXS_R4.md} | 0 .../pull_requests/{155-IQ2_XS_R4.md => 155 - IQ2_XS_R4.md} | 0 github-data/pull_requests/{156-IQ2_S_R4.md => 156 - IQ2_S_R4.md} | 0 ...i-quants improvements.md => 157 - R4 i-quants improvements.md} | 0 ...aster R4 legacy quants.md => 158 - Faster R4 legacy quants.md} | 0 .../pull_requests/{16-Fix Makefile.md => 16 - Fix Makefile.md} | 0 .../pull_requests/{161-MSVC fixes.md => 161 - MSVC fixes.md} | 0 github-data/pull_requests/{162-IQ3_S_R4.md => 162 - IQ3_S_R4.md} | 0 ....md => 163 - q4_0_r4_ Use AVX2 version for matrix x vector.md} | 0 .../{168-Falcon3 changes.md => 168 - Falcon3 changes.md} | 0 ...s.md => 169 - Be able to re-quantize MS BitNet I2_S models.md} | 0 ...line - Aug 12 2024.md => 17 - Merge mainline - Aug 12 2024.md} | 0 ...70-MoE fix for R4 quants.md => 170 - MoE fix for R4 quants.md} | 0 ....md => 171 - Fix lower FA performance for even batch sizes.md} | 0 ... improvements.md => 172 - CPU Flash Attention improvements.md} | 0 ...improvements.md => 173 - More Flash Attention improvements.md} | 0 ...f16_r16.md => 174 - On Zen4 repack fp16 models to bf16_r16.md} | 0 ...16 support on AVX2.md => 175 - Better BF16 support on AVX2.md} | 0 ...eek V3 support added.md => 176 - Deepseek V3 support added.md} | 0 ...77-Update chat templates.md => 177 - Update chat templates.md} | 0 ...Q8_0, IQ4_XS).md => 178 - Interleave 8 rows _Q8_0_ IQ4_XS_.md} | 0 ...ce improvements.md => 179 - Minor performance improvements.md} | 0 ...k MLA Optimizations.md => 180 - Deepseek MLA Optimizations.md} | 0 github-data/pull_requests/{181-Various.md => 181 - Various.md} | 0 ...2_Zen4.md => 182 - Faster Q4_K_R4 and Q5_K_R4 on AVX2_Zen4.md} | 0 .../{184-Deepseek-Lite.md => 184 - Deepseek-Lite.md} | 0 ...1.5 bpw quants.md => 185 - IQ1_S_R4_ better 1.5 bpw quants.md} | 0 ..._gemv.md => 186 - iq1_s_r4_ slightly faster NEON gemm_gemv.md} | 0 ...75 bpw quants.md => 187 - IQ1_M_R4_ better 1.75 bpw quants.md} | 0 .../{188-Add optional MLA.md => 188 - Add optional MLA.md} | 0 ...8.md => 189 - Rename q4_0_r4_ q8_0_r4 and iq4_xs_r4 to _r8.md} | 0 ...9-Skip barriers of noops.md => 19 - Skip barriers of noops.md} | 0 ...tiguous rms norm.md => 190 - cuda_ non-contiguous rms norm.md} | 0 ...d => 191 - Add additional checks for iq1_s_r4 quantization.md} | 0 .../pull_requests/{192-Revert #79.md => 192 - Revert _79.md} | 0 github-data/pull_requests/{193-RPC sync.md => 193 - RPC sync.md} | 0 ... Q8_K_128 for IQ1_S_R4 and IQ1_M_R4 matrix multiplications.md} | 0 ...Optimizations V2.md => 195 - Deepseek MLA Optimizations V2.md} | 0 ...kernels.md => 197 - FA_ Add option to build all FA kernels.md} | 0 ...Load all MoE experts during warmup and make warmup 1 token.md} | 0 ...Offload Bitnet token embeddings to the GPU - the right way.md} | 0 ...d => 20 - iq2_k_ slightly better bpw - accuracy compromise.md} | 0 ...port (CPU only).md => 200 - DeepSeek FA support _CPU only_.md} | 0 ...rprotectiveness.md => 202 - Fix imatrix overprotectiveness.md} | 0 ...qk_mul_mat on AVX512 systems that are missing BF16 support.md} | 0 ...prompt processing.md => 205 - Faster MLA prompt processing.md} | 0 ...-cache for MLA.md => 206 - MLA_ allow Q8_0 K-cache for MLA.md} | 0 ...TG for GQA models.md => 207 - Faster CPU TG for GQA models.md} | 0 ...08 - Q8_KV_ 8-bit quantization type targeting the KV cache.md} | 0 ...uantize_stats_ print rmse and max error as fraction of _x_.md} | 0 .../{210-Repack also experts.md => 210 - Repack also experts.md} | 0 ...M_GEMV for IQ1_S.md => 212 - Optimized GEMM_GEMV for IQ1_S.md} | 0 ..._gemv for legacy quants when row size is not divisible by .md} | 0 ...Trying to fix confusion betweem HAVE_FANCY_SIMD and AVX512.md} | 0 ...s really fixes the confusion between AVX512 and FANCY_SIMD.md} | 0 ...gy for attention matrix multiplications when generating to.md} | 0 ...ns.md => 219 - Fuse MoE up and gate matrix multiplications.md} | 0 ...uantization for Q8_K.md => 22 - AVX2 quantization for Q8_K.md} | 0 github-data/pull_requests/{220-Fix #217.md => 220 - Fix _217.md} | 0 ...hmark.md => 225 - Examples _ Add new sweep-bench benchmark.md} | 0 ...226 - Fix compilation error with IQK_FA_ALL_QUANTS enabled.md} | 0 ..._up and ffn_gate.md => 229 - Fused MoE ffn_up and ffn_gate.md} | 0 .../pull_requests/{23-iq4_k tweak.md => 23 - iq4_k tweak.md} | 0 github-data/pull_requests/{231-Fix #230.md => 231 - Fix _230.md} | 0 ...user the option to override where model weights are stored.md} | 0 ...ghtly faster CUDA MLA.md => 233 - Slightly faster CUDA MLA.md} | 0 .../{234-Faster MLA on CUDA.md => 234 - Faster MLA on CUDA.md} | 0 ...e.md => 235 - Option to use MLA without a transposed cache.md} | 0 ...36-Feat_lock free server.md => 236 - Feat_lock free server.md} | 0 ...compute buffers.md => 237 - Reduce size of compute buffers.md} | 0 ... => 238 - A better way to measure the cost of ggml_barrier.md} | 0 ... Expert Reduction.md => 239 - SER - Smart Expert Reduction.md} | 0 ...p_ minor improvement.md => 24 - softcap_ minor improvement.md} | 0 ...{240-Flash MLA (CPU only).md => 240 - Flash MLA _CPU only_.md} | 0 ...Flash Attention .md => 241 - DeepSeek CUDA Flash Attention.md} | 0 .../{243-Better FlashMLA.md => 243 - Better FlashMLA.md} | 0 ...> 244 - Custom quantization rules with regular expressions.md} | 0 ...t processing.md => 246 - Faster FlashMLA prompt processing.md} | 0 .../{247-FlashMLA on CUDA.md => 247 - FlashMLA on CUDA.md} | 0 ...on on CUDA.md => 248 - Faster MoE token generation on CUDA.md} | 0 ...-DeepSeek imatrix stuff.md => 250 - DeepSeek imatrix stuff.md} | 0 ... fp32 for FlashMLA.md => 251 - Try using fp32 for FlashMLA.md} | 0 ... => 252 - MLA-2_ Allow usage of q8_0 for KV cache on CUDA.md} | 0 ... - FlashMLA-2 _CPU_ faster and smaller compute buffer size.md} | 0 ...> 259 - Prepare wk_b tensors of DeepSeek models on the fly.md} | 0 ...60 - FlashMLA-2_ reduce compute buffer size _CUDA and CPU_.md} | 0 ...ile time option to use bf16 for quants without MMQ kernels.md} | 0 github-data/pull_requests/{262-Fix #261.md => 262 - Fix _261.md} | 0 ...md => 264 - Make Q8_0 KV cache work with FlasMLA-2 on CUDA.md} | 0 ...A-2.md => 265 - Allow q8_0 cache on the CPU for FlashMLA-2.md} | 0 ...n CUDA.md => 268 - Prevent FlashMLA-1 from running on CUDA.md} | 0 ...e_forward_dup_q.md => 269 - Fix ggml_compute_forward_dup_q.md} | 0 .../pull_requests/{27-Faster Gemma2.md => 27 - Faster Gemma2.md} | 0 ...md => 270 - Honor mmap setting when using tensor overrides.md} | 0 ...t models to row-interleaved quants using the quantize tool.md} | 0 ...md => 273 - FlashMLA-3_ the best of both worlds _CPU only_.md} | 0 ...274 - Specify tensor name regex for tensors to be repacked.md} | 0 ...> 275 - Fix bug_ missing parentheses in logical expression.md} | 0 ...ort (text only).md => 276 - Add Gemma3 support _text only_.md} | 0 ...the CPU.md => 277 - Attempt to improve FlashMLA on the CPU.md} | 0 ... on Linux.md => 278 - Test transparent huge pages on Linux.md} | 0 .../{279-Fighting with cmake.md => 279 - Fighting with cmake.md} | 0 .../{28-Binary KQ mask.md => 28 - Binary KQ mask.md} | 0 ...80 - Native build ooption for CUDA when GGML_NATIVE is set.md} | 0 ...peed.md => 282 - Improve DeepSeek batched processing speed.md} | 0 ...implementation.md => 283 - CUDA_ better MoE implementation.md} | 0 ...h_ enable having different number of threads for tg and pp.md} | 0 ...r DeepSeek-R1_.md => 287 - Is this better for DeepSeek-R1_.md} | 0 ...d => 289 - Update sweep bench _depracating .jsonl support_.md} | 0 ...{290-mmap backed KV cache.md => 290 - mmap backed KV cache.md} | 0 ...R8.md => 291 - Disable Zen4 optimizations for Q8_0_Q8_0_R8.md} | 0 ...md => 292 - Use bf16 instead of fp16 block scales for q8_1.md} | 0 ...sor row size is multiple of block size also when quantizin.md} | 0 ...ization improvements.md => 295 - Quantization improvements.md} | 0 ...ate gguf-py constants.md => 298 - Update gguf-py constants.md} | 0 ...uants.md => 299 - Additional guards for interleaved quants.md} | 0 ...erge mainline llama.cpp.md => 3 - Merge mainline llama.cpp.md} | 0 github-data/pull_requests/{301-Fix #300.md => 301 - Fix _300.md} | 0 ...improvements (2).md => 302 - Quantization improvements _2_.md} | 0 ...to q8_2.md => 303 - Fix ARM_NEON build failure due to q8_2.md} | 0 ...ssing.md => 307 - Metal_ much faster MoE prompt processing.md} | 0 ...rrors on ARM.md => 309 - Fix GCC compilation errors on ARM.md} | 0 ...disabled.md => 31 - Fix build when iqk_mul_mat is disabled.md} | 0 ...-Metal_ FA and FlashMLA.md => 310 - Metal_ FA and FlashMLA.md} | 0 ...RM.md => 311 - Add -flax-vector-conversions for GCC on ARM.md} | 0 ...2_XS quantization.md => 312 - Improved IQ2_XS quantization.md} | 0 ...ed to synchronize before using device to host async memcpy.md} | 0 ...ons.md => 315 - Try not repacking q8_0 for FA computations.md} | 0 ...17-Add copyright notices.md => 317 - Add copyright notices.md} | 0 ...p authors.md => 318 - Use links for ggml_llama.cpp authors.md} | 0 .../{32-Zen4 Flash Attention.md => 32 - Zen4 Flash Attention.md} | 0 ...320 - Guard against attempts to use MLA for non-MLA models.md} | 0 ...upport (text only).md => 321 - LlaMA-4 support _text only_.md} | 0 .../{324-Correct L4 rms_norm.md => 324 - Correct L4 rms_norm.md} | 0 .../{325-Fix KLD precision.md => 325 - Fix KLD precision.md} | 0 ...d => 326 - WIP Compute per layer LIM Scores during imatrix.md} | 0 ...IQ1_M quantization.md => 327 - Improved IQ1_M quantization.md} | 0 ...cs.md => 328 - imatrix_ collect layer influence statistics.md} | 0 ...29 - Add ability to hide imatrix details in llama-quantize.md} | 0 ... Do not process prompts containing binary data for escapes.md} | 0 ...size 256.md => 330 - Allow q8_0 KV cache for head size 256.md} | 0 ...fr q4_0_r8.md => 331 - Better gemm_gemv on AVX2 fr q4_0_r8.md} | 0 ...PU).md => 332 - Better TG performance for GQA models _CPU_.md} | 0 ... - Support GLM-4-0414 models based on piDack_s mainline PR.md} | 0 ... termux_android build.md => 336 - Fix termux_android build.md} | 0 ...2501 model.md => 337 - Add support for bitnet2b_2501 model.md} | 0 .../{338-BitNet adjustments.md => 338 - BitNet adjustments.md} | 0 ...dd support for Cohere2.md => 341 - Add support for Cohere2.md} | 0 ...42-Fix LLaMA-4 attention.md => 342 - Fix LLaMA-4 attention.md} | 0 ...expr funcs.md => 343 - cuda_ use switch in constexpr funcs.md} | 0 ...414 Model Support.md => 344 - Add GLM-4-0414 Model Support.md} | 0 .../{346-Fix FA on ARM CPUs.md => 346 - Fix FA on ARM CPUs.md} | 0 ...h flags.md => 347 - Add ability to manually set arch flags.md} | 0 ... q4_1 and q5_1 on Arm.md => 348 - Fix q4_1 and q5_1 on Arm.md} | 0 ... division by zero bug.md => 349 - Fix division by zero bug.md} | 0 ...x Zen4 Flash Attention.md => 35 - Fix Zen4 Flash Attention.md} | 0 .../{351-CPU FA improvements .md => 351 - CPU FA improvements.md} | 0 .../{352-Update README.md.md => 352 - Update README.md.md} | 0 ...R from llama.cpp.md => 355 - Apply Qwen3 PR from llama.cpp.md} | 0 ...md => 356 - Add missing enum values for qwen3 and qwen3moe.md} | 0 ...6-Zen4 Flash Attnetion 2.md => 36 - Zen4 Flash Attnetion 2.md} | 0 ...L_QUANTS on AVX2.md => 360 - Fix IQK_FA_ALL_QUANTS on AVX2.md} | 0 .../{364-Fix FA bug on AVX2.md => 364 - Fix FA bug on AVX2.md} | 0 ...> 366 - Add support for new Bitnet model architecture name.md} | 0 ...368 - Trying to fix iq1_s_r4_iq1_m_r4 quantization failure.md} | 0 ...-8.md => 369 - cmake_ force MSVC compiler charset to utf-8.md} | 0 ...7 - Performance improvements for legacy quants on ARM_NEON.md} | 0 ... GQA models .md => 370 - CUDA_ faster FA TG for GQA models.md} | 0 ...ttempt to fix #367.md => 371 - Another attempt to fix _367.md} | 0 ...{374-CUDA_ MMQ for IQ4_KS.md => 374 - CUDA_ MMQ for IQ4_KS.md} | 0 ...to sweep-bench.md => 375 - Add batch warmup to sweep-bench.md} | 0 ...ron models.md => 377 - Support for Llama-3-Nemotron models.md} | 0 ...f16 support.md => 38 - Zen4 Flash Attention - bf16 support.md} | 0 .../{382-Fix DeepSeek FA.md => 382 - Fix DeepSeek FA.md} | 0 ...on CUDA.md => 386 - FlashMLA-3 for DeepSeek models on CUDA.md} | 0 ...iqk_mul_mat.md => 39 - Add support for bf16 to iqk_mul_mat.md} | 0 ... Xeon Gold 6226R.md => 390 - Fix build for Xeon Gold 6226R.md} | 0 ...ix DeepSeek q8_0 cache.md => 391 - Fix DeepSeek q8_0 cache.md} | 0 ...VC build problem..md => 392 - fix some MSVC build problem..md} | 0 ...pSeek GGUFs.md => 394 - Handle incompatible DeepSeek GGUFs.md} | 0 ... multi-thread tanh.md => 4 - Simdify and multi-thread tanh.md} | 0 ...f16 support to CUDA.md => 40 - Adding bf16 support to CUDA.md} | 0 ...400 - Fix CUDA DeepSeek FlashMLA-3 with quantized KV cache.md} | 0 ...md => 402 - Fix missing rope_freqs with convert_hf_to_gguf.md} | 0 ... for MoE models.md => 404 - TG improvements for MoE models.md} | 0 .../{405-GPU offload policy.md => 405 - GPU offload policy.md} | 0 ...kernel.md => 406 - Fix race in the CUDA DeepSeek FA kernel.md} | 0 ...DeepSeek FA on CUDA.md => 408 - Faster DeepSeek FA on CUDA.md} | 0 ...ble faster prompt processing with mainline llama.cpp GGUFs.md} | 0 ...pport.md => 41 - iqk_mul_mat_ARM_NEON_ adding bf16 support.md} | 0 ...te.md => 410 - Better CPU FA performance for DeepSeek-Lite.md} | 0 ... models.md => 411 - Fix imatrix calculation for MLA models.md} | 0 ... CUDA FA on Touring.md => 413 - Fix new CUDA FA on Touring.md} | 0 .../{415-Fix SER (CPU).md => 415 - Fix SER _CPU_.md} | 0 .../{416-Fix SER (CUDA).md => 416 - Fix SER _CUDA_.md} | 0 ... => 417 - CUDA_ quantized GEMM for for IQ4_K_ IQ5_K_ IQ6_K.md} | 0 ...=> 418 - CUDA_ quantized GEMM for for IQ2_KS_ IQ2_K_ IQ3_K.md} | 0 ...{42-Adding fused rms_norm.md => 42 - Adding fused rms_norm.md} | 0 ...n on the CPU.md => 421 - Fix standard attention on the CPU.md} | 0 ....25 bpw quants.md => 422 - Adding IQ5_KS - 5.25 bpw quants.md} | 0 ....md => 424 - Adding forgotten template instance for iq5_ks.md} | 0 ...eaved IQ5_KS.md => 426 - IQ5_KS_R4_ row-interleaved IQ5_KS.md} | 0 ...7 - Fix AVX2 implementation of IQ4_K_ IQ4_KS_ IQ5_K_ IQ6_K.md} | 0 ..._KS.md => 428 - Zen4_ Faster PP for IQ2_KS_ IQ4_KS_ IQ5_KS.md} | 0 ...md => 429 - Option to enable or disable the CPU FA kernels.md} | 0 ...r PP on Zen4.md => 43 - iq2_tn_ slightly faster PP on Zen4.md} | 0 ...le multi-add for now.md => 430 - Disable multi-add for now.md} | 0 ...en MMQ ref and typo.md => 431 - Forgotten MMQ ref and typo.md} | 0 ...actor iqk_mul_mat.cpp.md => 435 - Refactor iqk_mul_mat.cpp.md} | 0 ...438 - Another attempt to fix the illegal memory access bug.md} | 0 ...ug fixes from mainline.md => 439 - Bug fixes from mainline.md} | 0 ...> 44 - Adding IQ1_TN - 1.6875 bpw for TriLM ternary models.md} | 0 ...PU inference.md => 441 - Trellis quants with CPU inference.md} | 0 .../{442-CUDA call tracer.md => 442 - CUDA call tracer.md} | 0 ...rategies.md => 443 - Streamline a bit the quant strategies.md} | 0 .../{444-gguf-split _ update.md => 444 - gguf-split _ update.md} | 0 ...2 code branch.md => 445 - Fix typo in non-AVX2 code branch.md} | 0 ...-Fix bug in MMVQ kernel.md => 446 - Fix bug in MMVQ kernel.md} | 0 ...{448-Fix MSVC compilation.md => 448 - Fix MSVC compilation.md} | 0 ... Legacy quants conversion schemes in convert_hf_to_gguf.py.md} | 0 ... support for IQ1_TN.md => 45 - Add CUDA support for IQ1_TN.md} | 0 ...ter IQ3_KT and IQ4_KT.md => 453 - Faster IQ3_KT and IQ4_KT.md} | 0 ...dd support for FP8 GGUF creation and re-quantization _WIP_.md} | 0 ..._MUL_MAT option.md => 457 - Remove GGML_IQK_MUL_MAT option.md} | 0 ...guf-py constants.md => 458 - Add missing gguf-py constants.md} | 0 ...etal implementation.md => 46 - IQ1_TN Metal implementation.md} | 0 ...ls for KT quants.md => 460 - aarch64 kernels for KT quants.md} | 0 ... implementation for IQ2_K_R4_ IQ3_K_R4_ IQ4_K_R4_ IQ5_K_R4.md} | 0 ...md => 462 - CUDA GEMM and GEMV for IQ4_KS_R4 and IQ5_KS_R4.md} | 0 ...fault to true.md => 465 - Set cache_prompt default to true.md} | 0 ... 468 - Minor _2_ iq2_ks TG performance improvement on CUDA.md} | 0 ...- Replace MLA-specific KV cache with the standard KV cache.md} | 0 ...VX2.md => 47 - iq2_tn_ slightly better performance on AVX2.md} | 0 ...ompatibility.md => 470 - Send _DONE_ for OAI compatibility.md} | 0 ... quants.md => 471 - NEON implementation for trellis quants.md} | 0 ...eplace MLA-specific KV cache with the standard KV cache V2.md} | 0 ...s..md => 475 - Metal implementatio for the trellis quants..md} | 0 ...orgotten refs and typo.md => 478 - forgotten refs and typo.md} | 0 .../{48-AVX2 Flash Attention.md => 48 - AVX2 Flash Attention.md} | 0 .../{480-Rpc improvement.md => 480 - Rpc improvement.md} | 0 .../{481-Webui improvement.md => 481 - Webui improvement.md} | 0 ...g.md => 482 - Trellis quants_ faster CPU prompt processing.md} | 0 ...convert_hf_to_gguf.py _ conversion from hf weights to Q6_0.md} | 0 ...lis implementation.md => 484 - BF16 Trellis implementation.md} | 0 ...-Adding the XTC sampler.md => 486 - Adding the XTC sampler.md} | 0 ...it.md => 487 - Make sure MMVQ is supported before using it.md} | 0 ...er CPU prompt processing for Trellis quants and MoE models.md} | 0 ...top-n-sigma sampler.md => 489 - Adding top-n-sigma sampler.md} | 0 ...M_NEON Flash Attention.md => 49 - ARM_NEON Flash Attention.md} | 0 ... for IQ1_S_R4.md => 492 - CUDA implementation for IQ1_S_R4.md} | 0 ...md => 493 - MMQ implementation for IQ4_KS_R4 and IQ5_KS_R4.md} | 0 ...DA implementation.md => 494 - IQ1_M_R4 CUDA implementation.md} | 0 ...ffn_up and ffn_gate are of the same type before using fmoe.md} | 0 ....md => 496 - Quick hack_ add the MLA flag to llama_hparams.md} | 0 ... => 497 - Make prompt cache saving and restoring MLA aware.md} | 0 ... 5 - Fusing a mat mul op followed by a scale op on the CPU.md} | 0 ...0-AVX2 Flash Attention 2.md => 50 - AVX2 Flash Attention 2.md} | 0 github-data/pull_requests/{501-Fix #499.md => 501 - Fix _499.md} | 0 ... endpoint that lists all the saved prompt caches to server.md} | 0 ...04 - Add DRY and fix the server to use other new samplers..md} | 0 ...plementation.md => 505 - New IQ4_KT trellis implementation.md} | 0 ...ix non rpc build error.md => 506 - Fix non rpc build error.md} | 0 ...ompile error (C2668).md => 508 - Fix Compile error _C2668_.md} | 0 .../pull_requests/{509-Docs update.md => 509 - Docs update.md} | 0 ... Quantized Flash Attention for all supported CPU platforms.md} | 0 ...ection of readme.md => 510 - Update News section of readme.md} | 0 .../pull_requests/{511-New IQ2_KT.md => 511 - New IQ2_KT.md} | 0 ...512 - Add top n sigma sampler in webui and other webui fix.md} | 0 .../{513-add dry sampler.md => 513 - add dry sampler.md} | 0 ...ing.md => 515 - IQ2_XXS_ much faster CPU prompt processing.md} | 0 ...- Much faster iq3_xxs GEMM via repacking to q8_0_r8 _AVX2_.md} | 0 ...ssing.md => 517 - IQ1_S_ much faster CPU prompt processing.md} | 0 ...ssing.md => 518 - IQ3_S_ much faster CPU prompt processing.md} | 0 ...cache.md => 52 - Fix bug and D _ 128 case for Q8_0 k-cache.md} | 0 ...or GPU offload.md => 520 - Better strategy for GPU offload.md} | 0 ...a slightly better GEMV version for IQ2_XXS_ IQ3_XXS_ IQ3_S.md} | 0 ...md => 525 - Faster CPU prompt processing for Q4_K and Q5_K.md} | 0 ...ed in #524_#525.md => 528 - Fix bug introduced in _524_525.md} | 0 ...d IQ4_KT, V2.md => 529 - New IQ2_KT_ IQ3_KT and IQ4_KT_ V2.md} | 0 ...tization mixes tweaks.md => 53 - Quantization mixes tweaks.md} | 0 ... 1).md => 531 - Much faster CPU prompt processing _part 1_.md} | 0 ... 2).md => 533 - Much faster CPU prompt processing _part 2_.md} | 0 ... 3).md => 534 - Much faster CPU prompt processing _part 3_.md} | 0 .../{535-Minor readme update.md => 535 - Minor readme update.md} | 0 ...-Fix KT Neon _ ARM typo.md => 536 - Fix KT Neon _ ARM typo.md} | 0 ...g.md => 537 - Update CMakeLists.txt to fix NDEBUG handling.md} | 0 ....md => 54 - Improve Q4_0 and Q8_0 performance on AVX2_Zen4.md} | 0 ...ange.md => 540 - Fix missed block_q8_x2 bf16 -_ i16 change.md} | 0 ... quants.md => 541 - Perhaps slightly faster trellis quants.md} | 0 .../{542-Fix NEON build.md => 542 - Fix NEON build.md} | 0 ...s on ARM_NEON .md => 544 - New integer trellis on ARM_NEON.md} | 0 ...46 - Faster ARM_NEON GEMM implementation for legacy quants.md} | 0 ...ld_ add script to simplify build_test workflow for Android.md} | 0 ... - Much faster prompt processing for IQK quants _ARM_NEON_.md} | 0 ...rmance on AVX2.md => 55 - Improve Q5_0 performance on AVX2.md} | 0 ...50 - Much faster prompt processing for I-quants _ARM_NEON_.md} | 0 ...52 - Much faster prompt processing for k-quants _ARM_NEON_.md} | 0 ...h faster prompt processing for IQ1_S and IQ1_M on ARM_NEON.md} | 0 ...ion.md => 554 - Update README.md to add quickstart section.md} | 0 ...dd Falcon-Edge support.md => 555 - Add Falcon-Edge support.md} | 0 ...for iqX_r4 quants .md => 557 - CUDA_ MMQ for iqX_r4 quants.md} | 0 ...d => 558 - Add mikupad to ik_llama as an alternative WebUI.md} | 0 ...Use cuBLAS for large batches and quants with block size 16.md} | 0 ...{56-BF16 support on Metal.md => 56 - BF16 support on Metal.md} | 0 ...ve what appears to be unnecessary asserts in ggml_cuda_cpy.md} | 0 ... Merge vulkan code from mainline up to commit of 6_28_2025.md} | 0 ...upport for 561.md => 565 - add hunyuan moe support for 561.md} | 0 ...{566-Adding IQ3_KS quants.md => 566 - Adding IQ3_KS quants.md} | 0 ...ed improvement.md => 567 - Minor CUDA PP speed improvement.md} | 0 ...onally disable fused ops when building with Vulkan enabled.md} | 0 ...Zen4 horizontal sums .md => 57 - AVX2_Zen4 horizontal sums.md} | 0 ...- Remove duplicate_misplaced cmake find_package for Vulkan.md} | 0 .../{571-Fix CMakeLists.md => 571 - Fix CMakeLists.md} | 0 ... dots.llm1 models.md => 573 - Support for dots.llm1 models.md} | 0 ...ask padding to 64.md => 574 - Change KQ mask padding to 64.md} | 0 ...-Vulkan_ fused rms norm.md => 577 - Vulkan_ fused rms norm.md} | 0 ...pler.md => 578 - Do not crash when there is no DRY sampler.md} | 0 ...h RPC off.md => 579 - Fix debug build failure with RPC off.md} | 0 ...{58-Fix compiler warnings.md => 58 - Fix compiler warnings.md} | 0 ..._MUL_UNARY.md => 580 - Vulkan_ add GGML_OP_FUSED_MUL_UNARY.md} | 0 ...-add for now.md => 581 - Vulkan_ Disable multi-add for now.md} | 0 ...d => 582 - Vulkan_ adding GGML_OP_MULTI_ADD implementation.md} | 0 ...83-Adding forgotten file.md => 583 - Adding forgotten file.md} | 0 ...ls.md => 584 - Vulkan_ flash attention for DeepSeek models.md} | 0 ...kens.md => 585 - Special handling of Seed Coder FIM tokens.md} | 0 ...sampler.md => 587 - Fix crash when there is no DRY sampler.md} | 0 ....md => 588 - Fix server crash when there is no DRY sampler.md} | 0 ...89 - CUDA_ small PP performance improvement for MoE models.md} | 0 ...inor readme update.md => 592 - Another minor readme update.md} | 0 ...593 - Faster prompt processing for IQ2_KS_ IQ2_K_ IQ2_K_R4.md} | 0 ...A_ Faster prompt processing for several quantization types.md} | 0 ...an_ iquants and flash attention split_k_reduce improvement.md} | 0 ...-bit quantization.md => 6 - IQ4_K_ SOTA 4-bit quantization.md} | 0 .../{602-Adding IQ2_KL.md => 602 - Adding IQ2_KL.md} | 0 ...it.md => 603 - Check if MMQ should be used before using it.md} | 0 ...ng..md => 604 - Fix attn_v conditionality when quantizing..md} | 0 ..._ks to constants.py.md => 606 - Add iq3_ks to constants.py.md} | 0 ...md => 607 - vulkan_ support softmax_FA batch and broadcast.md} | 0 ...08-Vulkan_ a fresh start.md => 608 - Vulkan_ a fresh start.md} | 0 ....md => 609 - Added kimi-k2 support _ported from llama.cpp_.md} | 0 ...md => 61 - Adding ability to have meta data per tensor row.md} | 0 ...2 version.md => 610 - q8_k_r8_ experimental AVX512 version.md} | 0 ... 611 - Bump GGML_MAX_CONTEXTS to allow loading more shards.md} | 0 ...plate.md => 612 - kimi-k2 convert script and chat template.md} | 0 ...TA quants.md => 616 - Adding IQ1_KT - 1.75 bpw SOTA quants.md} | 0 ... indentation.md => 617 - Fixup kimi-k2 convert indentation.md} | 0 ...ew Features for Conversations_ Settings_ and Chat Messages.md} | 0 ...ion.md => 62 - Use fp32 for K_Q in Metal FA implementation.md} | 0 ...8.md => 620 - Bump Windows max open files from 512 to 2048.md} | 0 ...> 622 - Add GGML_MAX_CONTEXTS definition in CMakeLists.txt.md} | 0 .../{624-Quantization tweaks.md => 624 - Quantization tweaks.md} | 0 ...2.md => 628 - _Draft_ Function calling support for Kimi-K2.md} | 0 .../{630-GEMM for IQ1_M.md => 630 - GEMM for IQ1_M.md} | 0 ...64 - Better sub-3-bit quantization mixes with a qkv tensor.md} | 0 ...5-Adding SWIGLU unary op.md => 65 - Adding SWIGLU unary op.md} | 0 ...DA non-contiguous RoPE.md => 66 - CUDA non-contiguous RoPE.md} | 0 ...o fix replace_all.md => 68 - It is time to fix replace_all.md} | 0 .../{69-Allow bf16 kv-cache.md => 69 - Allow bf16 kv-cache.md} | 0 ...K, IQ3_K and IQ5_K.md => 7 - Adding IQ2_K_ IQ3_K and IQ5_K.md} | 0 .../{70-Fused unary(x)_y.md => 70 - Fused unary_x_y.md} | 0 ...iqk_mul_mat_ better srategy when nrc_y not divisible by ny.md} | 0 ...2 - iqk_mul_mat_ better iq4_nl implementation on Zen4_AVX2.md} | 0 ...version.md => 73 - CUDA_ faster float -_ iq4_nl conversion.md} | 0 ...md => 74 - IQ4_NL kv-cache on the CPU _Zen4_AVX2_ARM_NEON_.md} | 0 ...x Q5_0 flash attention.md => 75 - Fix Q5_0 flash attention.md} | 0 ...faster quantization.md => 76 - iq4_nl_ faster quantization.md} | 0 .../pull_requests/{77-Adding Q6_0.md => 77 - Adding Q6_0.md} | 0 ...aster Zen4_AVX2.md => 78 - q6_0_ Slightly faster Zen4_AVX2.md} | 0 ...ry.md => 79 - Do not quantize activations if not necessary.md} | 0 ...e to c++17 projectwide.md => 80 - Move to c_17 projectwide.md} | 0 ...scale fudge factors.md => 81 - Cleanup scale fudge factors.md} | 0 ...w IQ4_KS.md => 83 - New SOTA quantization_ 4.25 bpw IQ4_KS.md} | 0 .../{84-Better model info.md => 84 - Better model info.md} | 0 ...tion.md => 85 - IQ2_KS_ 2.1875 bpw non-linear quantization.md} | 0 ...tion.md => 86 - Fix and optimize iq2k Metal implementation.md} | 0 ...oduct.md => 87 - iq3_k_ fix and optimize Metal dot product.md} | 0 ..._ 4.0 bpw quants.md => 89 - Adding IQ4_KSS_ 4.0 bpw quants.md} | 0 ...D-ified GeLU .md => 9 - Fused soft cap and SIMD-ified GeLU.md} | 0 ...ct on Metal.md => 90 - iq4_ks_ faster dot product on Metal.md} | 0 ... CLI - Specify GGML_TYPE to quantize for the main tensors..md} | 0 ...re.md => 93 - Attempt to blindly fix Windows build failure.md} | 0 ...pproach.md => 94 - Adding _agray3_s graph caching approach.md} | 0 ...ant strategies_ attn_q Q4 _ attn_v Q6 for Llama 3.1 Q5_K_S.md} | 0 ...ptional.md => 97 - Bitnet_ make the scale tensors optional.md} | 0 ...oken.md => 98 - Avoid rebuild of GGML graph for each token.md} | 0 ..._NL for KV-cache in token generation using Flash Attention.md} | 0 626 files changed, 0 insertions(+), 0 deletions(-) rename github-data/discussions/{100-New argument _ env variable for GGML_SCHED_MAX_COPIES_.md => 100 - New argument _ env variable for GGML_SCHED_MAX_COPIES_.md} (100%) rename github-data/discussions/{104-Convenience improvements for llama-quantize.md => 104 - Convenience improvements for llama-quantize.md} (100%) rename github-data/discussions/{140-Questions about weight[j].md => 140 - Questions about weight_j_.md} (100%) rename github-data/discussions/{15-Will LQER improve k- and i-quants_.md => 15 - Will LQER improve k- and i-quants_.md} (100%) rename github-data/discussions/{164-Latest CPU performance comparison with llama.cpp.md => 164 - Latest CPU performance comparison with llama.cpp.md} (100%) rename github-data/discussions/{165-Norm RMS Epsilon.md => 165 - Norm RMS Epsilon.md} (100%) rename github-data/discussions/{166-Learning more LLM quantization.md => 166 - Learning more LLM quantization.md} (100%) rename github-data/discussions/{18-CPU beating GPU in token generation speed.md => 18 - CPU beating GPU in token generation speed.md} (100%) rename github-data/discussions/{201-What is the NUMA situation _.md => 201 - What is the NUMA situation _.md} (100%) rename github-data/discussions/{211-help me create an importance matrix primer.md => 211 - help me create an importance matrix primer.md} (100%) rename github-data/discussions/{223-Recent performance testing with DeepSeek R1.md => 223 - Recent performance testing with DeepSeek R1.md} (100%) rename github-data/discussions/{242-Switching from llama.cpp_ktransformers, seeking advice_guidance.md => 242 - Switching from llama.cpp_ktransformers_ seeking advice_guidance.md} (100%) rename github-data/discussions/{25-CPU prompt processing speed for large contexts.md => 25 - CPU prompt processing speed for large contexts.md} (100%) rename github-data/discussions/{256-Diverging from llama.cpp.md => 256 - Diverging from llama.cpp.md} (100%) rename github-data/discussions/{258-Quick-start Guide coming over from llama.cpp and ktransformers!.md => 258 - Quick-start Guide coming over from llama.cpp and ktransformers_.md} (100%) rename github-data/discussions/{266-Benchmarking DeepSeek R1 - 16x3090.md => 266 - Benchmarking DeepSeek R1 - 16x3090.md} (100%) rename github-data/discussions/{286-Testing `deepseek-ai_DeepSeek-V3-0324` model support..md => 286 - Testing _deepseek-ai_DeepSeek-V3-0324_ model support..md} (100%) rename github-data/discussions/{288-On @compilade's PR 12557 and @jukofyork's quantization ideas.md => 288 - On _compilade_s PR 12557 and _jukofyork_s quantization ideas.md} (100%) rename github-data/discussions/{316-Mainline is now copying stuff from ik_llama.cpp.md => 316 - Mainline is now copying stuff from ik_llama.cpp.md} (100%) rename github-data/discussions/{319-KTransformers copying ik_llama.cpp.md => 319 - KTransformers copying ik_llama.cpp.md} (100%) rename github-data/discussions/{323-Is there an easy way to repack an existing GGUF so it could be used without --run-time-repack (thus enabling mmap).md => 323 - Is there an easy way to repack an existing GGUF so it could be used wit.md} (100%) rename github-data/discussions/{334-`iq4_ks` performs great on gemma-3-27b-it-qat-q4_0-unquantized.md => 334 - _iq4_ks_ performs great on gemma-3-27b-it-qat-q4_0-unquantized.md} (100%) rename github-data/discussions/{350-Maverick slow prompt with gpu.md => 350 - Maverick slow prompt with gpu.md} (100%) rename github-data/discussions/{354-Not all MLAs are born equal.md => 354 - Not all MLAs are born equal.md} (100%) rename github-data/discussions/{357-Qwen3 - early performance comparisons.md => 357 - Qwen3 - early performance comparisons.md} (100%) rename github-data/discussions/{359-Qwen3 quantization experiments.md => 359 - Qwen3 quantization experiments.md} (100%) rename github-data/discussions/{372-multy gpu.md => 372 - multy gpu.md} (100%) rename github-data/discussions/{384-ik_llama.cpp issues on an old workstation.md => 384 - ik_llama.cpp issues on an old workstation.md} (100%) rename github-data/discussions/{385-Qwen3 235B performance on Intel Xeon Scalable processor.md => 385 - Qwen3 235B performance on Intel Xeon Scalable processor.md} (100%) rename github-data/discussions/{393-Creating quantized models.md => 393 - Creating quantized models.md} (100%) rename github-data/discussions/{395-Why does imatrix not tokenize special tokens_.md => 395 - Why does imatrix not tokenize special tokens_.md} (100%) rename github-data/discussions/{396-Best settings for Maverick - Dual CPU Xeon 8480+ - RTX 3090.md => 396 - Best settings for Maverick - Dual CPU Xeon 8480_ - RTX 3090.md} (100%) rename github-data/discussions/{397-KV split while using `-sm row`.md => 397 - KV split while using _-sm row_.md} (100%) rename github-data/discussions/{399-Qwen 30b.A3b IK_LCPP comparisons on lowspec machine.md => 399 - Qwen 30b.A3b IK_LCPP comparisons on lowspec machine.md} (100%) rename github-data/discussions/{401-install bitnet (or other cpu models) on a fresh termux aarch64.md => 401 - install bitnet _or other cpu models_ on a fresh termux aarch64.md} (100%) rename github-data/discussions/{403-Tool Calling and Structured Response (Json Mode) support.md => 403 - Tool Calling and Structured Response _Json Mode_ support.md} (100%) rename github-data/discussions/{434-Quant Cookers Basic Guide.md => 434 - Quant Cookers Basic Guide.md} (100%) rename github-data/discussions/{451-Context reuse _ context shift for long prompts.md => 451 - Context reuse _ context shift for long prompts.md} (100%) rename github-data/discussions/{459-qwen3 metrics on ancient hardware (2x xeon Vs 2x P100).md => 459 - qwen3 metrics on ancient hardware _2x xeon Vs 2x P100_.md} (100%) rename github-data/discussions/{466-A curiosity..md => 466 - A curiosity..md} (100%) rename github-data/discussions/{477-DeepSeek-R1-0528 ik quants!.md => 477 - DeepSeek-R1-0528 ik quants_.md} (100%) rename github-data/discussions/{491--rtr actually hurts prompt t_s for large ubatch_.md => 491 - -rtr actually hurts prompt t_s for large ubatch_.md} (100%) rename github-data/discussions/{519-Android Build.md => 519 - Android Build.md} (100%) rename github-data/discussions/{526-Partial requant feature to save compute and time during tests..md => 526 - Partial requant feature to save compute and time during tests..md} (100%) rename github-data/discussions/{532-Guidance on GPU Layer Offloading Strategy in ik_llama.cpp for Multi GPU Rig (2x5090 + 2x4090).md => 532 - Guidance on GPU Layer Offloading Strategy in ik_llama.cpp for Multi GPU.md} (100%) rename github-data/discussions/{543-dots.llm1 support and thanks.md => 543 - dots.llm1 support and thanks.md} (100%) rename github-data/discussions/{545-Vulkan support_.md => 545 - Vulkan support_.md} (100%) rename github-data/discussions/{548-Poor performance with bf16 model on Qwen3 30B-A3B.md => 548 - Poor performance with bf16 model on Qwen3 30B-A3B.md} (100%) rename github-data/discussions/{556-ik_llama.cpp for Armv8.0.md => 556 - ik_llama.cpp for Armv8.0.md} (100%) rename github-data/discussions/{562-AMD GPU Vulkan & ROCm_HIP Discussion.md => 562 - AMD GPU Vulkan _ ROCm_HIP Discussion.md} (100%) rename github-data/discussions/{564-Maybe an interesting CUDA PR here..md => 564 - Maybe an interesting CUDA PR here..md} (100%) rename github-data/discussions/{586-Slow KV cache rm operation.md => 586 - Slow KV cache rm operation.md} (100%) rename github-data/discussions/{590-How important is Vulkan back-end development_.md => 590 - How important is Vulkan back-end development_.md} (100%) rename github-data/discussions/{591-I dont see any speed improvement in generation, so want to understand if i am missing something.md => 591 - I dont see any speed improvement in generation_ so want to understand i.md} (100%) rename github-data/discussions/{594-Is AVX2 a hard requirement on x64_.md => 594 - Is AVX2 a hard requirement on x64_.md} (100%) rename github-data/discussions/{599-mla matrix absorbtion.md => 599 - mla matrix absorbtion.md} (100%) rename github-data/discussions/{613-Pathological Quant_CUDA combinations -- How to know what works_.md => 613 - Pathological Quant_CUDA combinations -- How to know what works_.md} (100%) rename github-data/discussions/{619-gpu p2p utilization.md => 619 - gpu p2p utilization.md} (100%) rename github-data/discussions/{621-Deepseek v3_r1 poisoned prompt_.md => 621 - Deepseek v3_r1 poisoned prompt_.md} (100%) rename github-data/discussions/{623-Quantizing panels_bundles instead of blocks_.md => 623 - Quantizing panels_bundles instead of blocks_.md} (100%) rename github-data/discussions/{63-LLaMA-3.2 quantization evaluation.md => 63 - LLaMA-3.2 quantization evaluation.md} (100%) rename github-data/discussions/{8-New quantization types IQ2_K, IQ3_K, IQ4_K, IQ5_K.md => 8 - New quantization types IQ2_K_ IQ3_K_ IQ4_K_ IQ5_K.md} (100%) rename github-data/discussions/{82-4bpw GGML TYPE_.md => 82 - 4bpw GGML TYPE_.md} (100%) rename github-data/discussions/{95-Bitnet.md => 95 - Bitnet.md} (100%) rename github-data/issues/{103-Bug_ K cache without FA.md => 103 - Bug_ K cache without FA.md} (100%) rename github-data/issues/{133-Refactor_ update ggml library_.md => 133 - Refactor_ update ggml library_.md} (100%) rename github-data/issues/{159-Feature Request_ steps how to compile as cmake i struction on the origi al repo not work here..md => 159 - Feature Request_ steps how to compile as cmake i struction on the origi.md} (100%) rename github-data/issues/{160-Bug_ Can't compile on MSVC 2022.md => 160 - Bug_ Can_t compile on MSVC 2022.md} (100%) rename github-data/issues/{167-Bug_ Unable to quantize Falcon 10B 1.58 bitnet model.md => 167 - Bug_ Unable to quantize Falcon 10B 1.58 bitnet model.md} (100%) rename github-data/issues/{183-Refactor_ iqk_mul_mat.md => 183 - Refactor_ iqk_mul_mat.md} (100%) rename github-data/issues/{196-Refactor_ remove usage of Q8_1 for activation quantization.md => 196 - Refactor_ remove usage of Q8_1 for activation quantization.md} (100%) rename github-data/issues/{199-Bug_ Changing system_prompt on llama-server at runtime breaks parallel processing.md => 199 - Bug_ Changing system_prompt on llama-server at runtime breaks parallel .md} (100%) rename github-data/issues/{203-Bug_ Compliation Error for Intel(R) Xeon(R) Gold 6326 CPU.md => 203 - Bug_ Compliation Error for Intel_R_ Xeon_R_ Gold 6326 CPU.md} (100%) rename github-data/issues/{209-Does the iqk_mul_mat.cpp support 1.58-bit quantization model_.md => 209 - Does the iqk_mul_mat.cpp support 1.58-bit quantization model_.md} (100%) rename github-data/issues/{214-AVX512 build error.md => 214 - AVX512 build error.md} (100%) rename github-data/issues/{217-Bug_ CPU FA with fp16 K-cache is broken.md => 217 - Bug_ CPU FA with fp16 K-cache is broken.md} (100%) rename github-data/issues/{224-Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md => 224 - Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md} (100%) rename github-data/issues/{227-Prevent FA usage on CUDA when K and V head sizes are different.md => 227 - Prevent FA usage on CUDA when K and V head sizes are different.md} (100%) rename github-data/issues/{228-Feature Request_ create tool to offline repack models.md => 228 - Feature Request_ create tool to offline repack models.md} (100%) rename github-data/issues/{230-Weird assert when using online repacking.md => 230 - Weird assert when using online repacking.md} (100%) rename github-data/issues/{245-Bug_ Perplexity returns NaN with IQ4_KSS quantisation.md => 245 - Bug_ Perplexity returns NaN with IQ4_KSS quantisation.md} (100%) rename github-data/issues/{249-CUDA_ results for MoE models are not reproducible.md => 249 - CUDA_ results for MoE models are not reproducible.md} (100%) rename github-data/issues/{254-Split-mode row.md => 254 - Split-mode row.md} (100%) rename github-data/issues/{255-Feature Request_ dynamic layer by layer offloading during prompt processing for VRAM constrained scenarios.md => 255 - Feature Request_ dynamic layer by layer offloading during prompt proces.md} (100%) rename github-data/issues/{257-Bug_ mla=2 in llama-server will crash when request done.md => 257 - Bug_ mla_2 in llama-server will crash when request done.md} (100%) rename github-data/issues/{26-Feature Request_ Improve CPU processing speed for large contexts.md => 26 - Feature Request_ Improve CPU processing speed for large contexts.md} (100%) rename github-data/issues/{263-Benchmarking DeepSeek R1 - 16x3090.md => 263 - Benchmarking DeepSeek R1 - 16x3090.md} (100%) rename github-data/issues/{267-Feature Request_ HugePage mmap alloc for DeepSeek V3_R1.md => 267 - Feature Request_ HugePage mmap alloc for DeepSeek V3_R1.md} (100%) rename github-data/issues/{271-Possible regression computing `wk_b` tensors on the fly after PR #265.md => 271 - Possible regression computing _wk_b_ tensors on the fly after PR _265.md} (100%) rename github-data/issues/{281-Bug_ Strange dips in TG performance.md => 281 - Bug_ Strange dips in TG performance.md} (100%) rename github-data/issues/{285-llama-perplexity giving all NaNs on unsloth Q8_0 quant.md => 285 - llama-perplexity giving all NaNs on unsloth Q8_0 quant.md} (100%) rename github-data/issues/{29-Bug_ some ifdefs missing in ggml_src_iqk_iqk_quantize.cpp.md => 29 - Bug_ some ifdefs missing in ggml_src_iqk_iqk_quantize.cpp.md} (100%) rename github-data/issues/{293-Feature Request_ IQ6_K row interleaved quant.md => 293 - Feature Request_ IQ6_K row interleaved quant.md} (100%) rename github-data/issues/{296-Possible numerical stability issue with experimental quant of DeepSeek-V3-0324_.md => 296 - Possible numerical stability issue with experimental quant of DeepSeek-.md} (100%) rename github-data/issues/{297-Update gguf-py scripts to support new quant types..md => 297 - Update gguf-py scripts to support new quant types..md} (100%) rename github-data/issues/{30-Bug_ Appcrash on Windows 7 with GGML_USE_IQK_MULMAT.md => 30 - Bug_ Appcrash on Windows 7 with GGML_USE_IQK_MULMAT.md} (100%) rename github-data/issues/{300-Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md => 300 - Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md} (100%) rename github-data/issues/{305-Gibberish output when using DeepSeek-V3-0324-IQ2_K_R4 on mixed CPU + 4 GPUs with -mla (1 or 2).md => 305 - Gibberish output when using DeepSeek-V3-0324-IQ2_K_R4 on mixed CPU _ 4 .md} (100%) rename github-data/issues/{306-Confused by the -mla flag. What's supported_.md => 306 - Confused by the -mla flag. What_s supported_.md} (100%) rename github-data/issues/{308-Bug_ Compiling for arm64, error_ cannot convert ‘const uint32x4_t’ to ‘uint8x16_t’ and similar errors.md => 308 - Bug_ Compiling for arm64_ error_ cannot convert _const uint32x4_t_ to _.md} (100%) rename github-data/issues/{314-Llama 4 Support_.md => 314 - Llama 4 Support_.md} (100%) rename github-data/issues/{322-Speculative decoding support.md => 322 - Speculative decoding support.md} (100%) rename github-data/issues/{335-Bug_ Llama 4 generates garbage with longer context (64K+; the issue is not present in the llama.cpp).md => 335 - Bug_ Llama 4 generates garbage with longer context _64K_ the issue is n.md} (100%) rename github-data/issues/{339-Bug_ bitnet2b_2501 template issues.md => 339 - Bug_ bitnet2b_2501 template issues.md} (100%) rename github-data/issues/{34-Bug_ FA fails when processing prompt lengths that are not a multiple of 8.md => 34 - Bug_ FA fails when processing prompt lengths that are not a multiple of .md} (100%) rename github-data/issues/{340-Bug_ _unknown model architecture_ 'cohere2'_ when trying to load Command A model.md => 340 - Bug_ _unknown model architecture_ _cohere2_ when trying to load Command.md} (100%) rename github-data/issues/{345-build question newbie.md => 345 - build question newbie.md} (100%) rename github-data/issues/{353-Binaries releases for Windows _.md => 353 - Binaries releases for Windows _.md} (100%) rename github-data/issues/{358-Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md => 358 - Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md} (100%) rename github-data/issues/{361-Bug_ Build not detecting some supported ARM CPUs.md => 361 - Bug_ Build not detecting some supported ARM CPUs.md} (100%) rename github-data/issues/{362-README language is vague wrt. _quantization improvements_.md => 362 - README language is vague wrt. _quantization improvements_.md} (100%) rename github-data/issues/{363-Bug_ Gibberish output when using flash attention using Mistral-Small-Instruct-2409-Q6_K and Gemma-3-12b-it-q4_0 on CPU.md => 363 - Bug_ Gibberish output when using flash attention using Mistral-Small-I.md} (100%) rename github-data/issues/{365-Bug_ Updated BitNet arch bitnet-b1.58.md => 365 - Bug_ Updated BitNet arch bitnet-b1.58.md} (100%) rename github-data/issues/{367-Bug_ IQ1_S_R4, IQ1_M_R4 failed on Qwen3-235B-A22B.md => 367 - Bug_ IQ1_S_R4_ IQ1_M_R4 failed on Qwen3-235B-A22B.md} (100%) rename github-data/issues/{373-DeepSeekV3 0324 can't load newest UD quants (with MLA). Older quant works but with slower pre processing than gen speed (CPU + CUDA).md => 373 - DeepSeekV3 0324 can_t load newest UD quants _with MLA_. Older quant wor.md} (100%) rename github-data/issues/{376-Bug_ unknown model architecture_ 'deci' (when loading Llama-3_1-Nemotron-Ultra-253B).md => 376 - Bug_ unknown model architecture_ _deci_ _when loading Llama-3_1-Nemotro.md} (100%) rename github-data/issues/{378-Feature Request_ Use ik_llama.cpp with llama-cpp-python.md => 378 - Feature Request_ Use ik_llama.cpp with llama-cpp-python.md} (100%) rename github-data/issues/{379-Bug_ Cannot build on WoA.md => 379 - Bug_ Cannot build on WoA.md} (100%) rename github-data/issues/{380-Drop at the start of generation.md => 380 - Drop at the start of generation.md} (100%) rename github-data/issues/{381-ik_llama.cpp_ggml_src_ggml-cuda_fattn.cu_66_ fatal error after latest.md => 381 - ik_llama.cpp_ggml_src_ggml-cuda_fattn.cu_66_ fatal error after latest.md} (100%) rename github-data/issues/{383-Bug_ Loading DeepSeek R1T Chimera causes _llama_model_load_ error loading model_ check_tensor_dims_ tensor 'blk.0.attn_q_b.weight' has wrong shape; expected 1536, 73728, got 1536, 24576, 1, => 383 - Bug_ Loading DeepSeek R1T Chimera causes _llama_model_load_ error loadi.md} (100%) rename github-data/issues/{387-Bug_ bitnet 1.58 on termux segmentation fault.md => 387 - Bug_ bitnet 1.58 on termux segmentation fault.md} (100%) rename github-data/issues/{388-Bug_ Clash with mainline llama.cpp .so files.md => 388 - Bug_ Clash with mainline llama.cpp .so files.md} (100%) rename github-data/issues/{389-Bug_ llama-batched-bench crashed with batch size _2.md => 389 - Bug_ llama-batched-bench crashed with batch size _2.md} (100%) rename github-data/issues/{398-Bug_ -fmoe causing illegal memory access.md => 398 - Bug_ -fmoe causing illegal memory access.md} (100%) rename github-data/issues/{407-Feature Request_ Support for function calling in llama-server.md => 407 - Feature Request_ Support for function calling in llama-server.md} (100%) rename github-data/issues/{412-Bug_ Static asserts trip during compile..md => 412 - Bug_ Static asserts trip during compile..md} (100%) rename github-data/issues/{419-qwen3 metrics in expert parallel(2x P100).md => 419 - qwen3 metrics in expert parallel_2x P100_.md} (100%) rename github-data/issues/{420-Bug_ standard attention is broken.md => 420 - Bug_ standard attention is broken.md} (100%) rename github-data/issues/{423-Bug_ Compile failure undefined reference to `void mul_mat_q_case.md => 423 - Bug_ Compile failure undefined reference to _void mul_mat_q_case.md} (100%) rename github-data/issues/{425-Bug_ CUDA error_ an illegal memory access was encountered.md => 425 - Bug_ CUDA error_ an illegal memory access was encountered.md} (100%) rename github-data/issues/{432-Refactor_ GGUF v14 broke compatibility with IQx_KS quants.md => 432 - Refactor_ GGUF v14 broke compatibility with IQx_KS quants.md} (100%) rename github-data/issues/{433-Feature Request_ CORS support.md => 433 - Feature Request_ CORS support.md} (100%) rename github-data/issues/{436-Bug_ Saving the prompt cache causes Segfault.md => 436 - Bug_ Saving the prompt cache causes Segfault.md} (100%) rename github-data/issues/{437-Feature Request_ support intel amx for further accelerate.md => 437 - Feature Request_ support intel amx for further accelerate.md} (100%) rename github-data/issues/{440-Feature Request_ Top n-sigma sampler.md => 440 - Feature Request_ Top n-sigma sampler.md} (100%) rename github-data/issues/{447-Compilation Error_ Error C2676.md => 447 - Compilation Error_ Error C2676.md} (100%) rename github-data/issues/{450-Bug_ Performance regression.md => 450 - Bug_ Performance regression.md} (100%) rename github-data/issues/{452-Falcon H1 Support.md => 452 - Falcon H1 Support.md} (100%) rename github-data/issues/{455-Bug_ KV cache is never reused in OpenAI compatible Chat Completion api.md => 455 - Bug_ KV cache is never reused in OpenAI compatible Chat Completion api.md} (100%) rename github-data/issues/{456-Bug_ no compilation without IQK_MULMAT.md => 456 - Bug_ no compilation without IQK_MULMAT.md} (100%) rename github-data/issues/{463-Research_ V100 Flash Attention Implementation.md => 463 - Research_ V100 Flash Attention Implementation.md} (100%) rename github-data/issues/{464-Bug_ The streaming every couple of rows blocks for 5-8s.md => 464 - Bug_ The streaming every couple of rows blocks for 5-8s.md} (100%) rename github-data/issues/{467-Bug_ Server does not send data_ [DONE] for OpenAI-compatible streaming endpoint `_v1_chat_completions`.md => 467 - Bug_ Server does not send data_ _DONE_ for OpenAI-compatible streaming .md} (100%) rename github-data/issues/{472-Bug_ Don't build ggml-aarch64 regardless of CPU arch type.md => 472 - Bug_ Don_t build ggml-aarch64 regardless of CPU arch type.md} (100%) rename github-data/issues/{474-Bug_ Perf Regression in PP throughput after Pull #461 (...R4 CUDA impl).md => 474 - Bug_ Perf Regression in PP throughput after Pull _461 _...R4 CUDA impl_.md} (100%) rename github-data/issues/{476-Research_ performance divergence.md => 476 - Research_ performance divergence.md} (100%) rename github-data/issues/{479-Bug_ _ggml_backend_cuda_graph_compute_ disabling CUDA graphs due to GPU architecture_ flood.md => 479 - Bug_ _ggml_backend_cuda_graph_compute_ disabling CUDA graphs due to GPU.md} (100%) rename github-data/issues/{485-Bug_ Illegal Memory Access loading model to CUDA1.md => 485 - Bug_ Illegal Memory Access loading model to CUDA1.md} (100%) rename github-data/issues/{490-Bug_ Performance drop with 14292913 #461.md => 490 - Bug_ Performance drop with 14292913 _461.md} (100%) rename github-data/issues/{498-question_ about quantize method.md => 498 - question_ about quantize method.md} (100%) rename github-data/issues/{499-Bug_ cache quantization crash with IQK_FORCE_BF16.md => 499 - Bug_ cache quantization crash with IQK_FORCE_BF16.md} (100%) rename github-data/issues/{500-Bug_ Insane cudaMalloc OOM Error on Dual 3090 GPUs.md => 500 - Bug_ Insane cudaMalloc OOM Error on Dual 3090 GPUs.md} (100%) rename github-data/issues/{503-Bug_ server_cli fails with segmentation fault.md => 503 - Bug_ server_cli fails with segmentation fault.md} (100%) rename github-data/issues/{507-Compatible gguf models _.md => 507 - Compatible gguf models _.md} (100%) rename github-data/issues/{514-CUDA Kernel Error on RTX 5090 (Compute Capability 12.0)_ _no kernel image is available for execution on the device_.md => 514 - CUDA Kernel Error on RTX 5090 _Compute Capability 12.0_ _no kernel imag.md} (100%) rename github-data/issues/{521-When offloading semi layers to some GPUs with -ot, TG t_s performance tanks (CUDA + CPU, DeepSeek V3-R1), while not on main llamacpp..md => 521 - When offloading semi layers to some GPUs with -ot_ TG t_s performance t.md} (100%) rename github-data/issues/{522-Bug_ disabling CUDA graphs due to mul_mat_id.md => 522 - Bug_ disabling CUDA graphs due to mul_mat_id.md} (100%) rename github-data/issues/{523-Bug_ tg speed drop after https___github.com_ikawrakow_ik_llama.cpp_pull_518.md => 523 - Bug_ tg speed drop after https_github.com_ikawrakow_ik_llama.cpp_pull_5.md} (100%) rename github-data/issues/{527-Bug_ Webui improvement #481 core dump with a certain question..md => 527 - Bug_ Webui improvement _481 core dump with a certain question..md} (100%) rename github-data/issues/{530-Getting crash on second prompt..md => 530 - Getting crash on second prompt..md} (100%) rename github-data/issues/{538-Bug_ GGML_ASSERT failed at first prompt.md => 538 - Bug_ GGML_ASSERT failed at first prompt.md} (100%) rename github-data/issues/{539-Bug_ garbage output.md => 539 - Bug_ garbage output.md} (100%) rename github-data/issues/{551-Feature Request_ Support for Falcon Edge series.md => 551 - Feature Request_ Support for Falcon Edge series.md} (100%) rename github-data/issues/{561-Feature Request_ Tencent Hunyuan-A13B model support.md => 561 - Feature Request_ Tencent Hunyuan-A13B model support.md} (100%) rename github-data/issues/{568-Feature Request_ ERNIE MoE Model Support.md => 568 - Feature Request_ ERNIE MoE Model Support.md} (100%) rename github-data/issues/{572-Bug_ Oops(ggml_compute_forward_sum_rows_f32, ffn_moe_weights_sum-60)_ found nan, on DeepSeek V3_R1 on CUDA + CPU.md => 572 - Bug_ Oops_ggml_compute_forward_sum_rows_f32_ ffn_moe_weights_sum-60_ fo.md} (100%) rename github-data/issues/{575-Bug_ llama-server crash with sampling order.md => 575 - Bug_ llama-server crash with sampling order.md} (100%) rename github-data/issues/{576-Bug_ llama-server crash with _Deepseek2 does not support K-shift_.md => 576 - Bug_ llama-server crash with _Deepseek2 does not support K-shift_.md} (100%) rename github-data/issues/{59-Bug_ GGML Compilation Error_ undefined references to `iqk_mul_mat'.md => 59 - Bug_ GGML Compilation Error_ undefined references to _iqk_mul_mat_.md} (100%) rename github-data/issues/{596-Bug_ Lastest commit broke llama-cli on Windows - mmq.cuh_107_ fatal error.md => 596 - Bug_ Lastest commit broke llama-cli on Windows - mmq.cuh_107_ fatal err.md} (100%) rename github-data/issues/{597-Feature Request_ Add THUDM_GLM-4-MoE-100B-A10B support.md => 597 - Feature Request_ Add THUDM_GLM-4-MoE-100B-A10B support.md} (100%) rename github-data/issues/{60-Bug_ Illegal instruction on NEON and Q4_0_4_4.md => 60 - Bug_ Illegal instruction on NEON and Q4_0_4_4.md} (100%) rename github-data/issues/{600-Feature Request_ Port --reasoning-budget from main llamacpp (llamaserver).md => 600 - Feature Request_ Port --reasoning-budget from main llamacpp _llamaserve.md} (100%) rename github-data/issues/{601-Bug_ llama-imatrix crashing.md => 601 - Bug_ llama-imatrix crashing.md} (100%) rename github-data/issues/{605-Bug_ IQ3_KS missing from GGMLQuantizationType - gguf_reader.py script cannot process IQ3_KS tensors.md => 605 - Bug_ IQ3_KS missing from GGMLQuantizationType - gguf_reader.py script c.md} (100%) rename github-data/issues/{614-Feature Request_ port no-mmproj-offload.md => 614 - Feature Request_ port no-mmproj-offload.md} (100%) rename github-data/issues/{615-Bug_ Gemma3 Vision not working.md => 615 - Bug_ Gemma3 Vision not working.md} (100%) rename github-data/issues/{625-Bug_ undefined symbol errors after successful compilation.md => 625 - Bug_ undefined symbol errors after successful compilation.md} (100%) rename github-data/issues/{626-Feature Request_ Add IQK GEMM for IQ1_M.md => 626 - Feature Request_ Add IQK GEMM for IQ1_M.md} (100%) rename github-data/issues/{627-Feature Request_ Tensor Parallelism.md => 627 - Feature Request_ Tensor Parallelism.md} (100%) rename github-data/issues/{629-Multi-GPU performance (Windows) is significantly worse than single-GPU.md => 629 - Multi-GPU performance _Windows_ is significantly worse than single-GPU.md} (100%) rename github-data/issues/{67-Feature Request_ Elliminate_reduce unnecessary copies .md => 67 - Feature Request_ Elliminate_reduce unnecessary copies.md} (100%) rename github-data/issues/{88-Bug_ Won't compile on MSVC.md => 88 - Bug_ Won_t compile on MSVC.md} (100%) rename github-data/issues/{92-Bug_ Quantized KV cache produces garbage in situation where llama.cpp does not.md => 92 - Bug_ Quantized KV cache produces garbage in situation where llama.cpp do.md} (100%) rename github-data/pull_requests/{1-Offload Bitnet token embeddings to the GPU.md => 1 - Offload Bitnet token embeddings to the GPU.md} (100%) rename github-data/pull_requests/{10-iq4_k_ speedup quantization by a factor of ~2.md => 10 - iq4_k_ speedup quantization by a factor of _2.md} (100%) rename github-data/pull_requests/{101-Enable q6_0 in flash attention.md => 101 - Enable q6_0 in flash attention.md} (100%) rename github-data/pull_requests/{102-Add support for Granite and GraniteMoE models.md => 102 - Add support for Granite and GraniteMoE models.md} (100%) rename github-data/pull_requests/{105-Fix quantized k-cache without FA.md => 105 - Fix quantized k-cache without FA.md} (100%) rename github-data/pull_requests/{106-Bitnet changes.md => 106 - Bitnet changes.md} (100%) rename github-data/pull_requests/{107-Faster IQ1_BN Metal implementation.md => 107 - Faster IQ1_BN Metal implementation.md} (100%) rename github-data/pull_requests/{108-Another Bitnet performance improvement on Metal.md => 108 - Another Bitnet performance improvement on Metal.md} (100%) rename github-data/pull_requests/{109-Bitnet CUDA improvements.md => 109 - Bitnet CUDA improvements.md} (100%) rename github-data/pull_requests/{11-Faster iq3_k and iq5_k quantization.md => 11 - Faster iq3_k and iq5_k quantization.md} (100%) rename github-data/pull_requests/{110-Bitnet_ use the fused mul-silu in the FFN network.md => 110 - Bitnet_ use the fused mul-silu in the FFN network.md} (100%) rename github-data/pull_requests/{111-Use fused mul - unary op also for MoE models.md => 111 - Use fused mul - unary op also for MoE models.md} (100%) rename github-data/pull_requests/{112-Faster MoE inference.md => 112 - Faster MoE inference.md} (100%) rename github-data/pull_requests/{113-Trellis quantization.md => 113 - Trellis quantization.md} (100%) rename github-data/pull_requests/{114-MMQ Kernel for Q6_0 (pretty please!).md => 114 - MMQ Kernel for Q6_0 _pretty please_.md} (100%) rename github-data/pull_requests/{115-MMQ for Q6_0.md => 115 - MMQ for Q6_0.md} (100%) rename github-data/pull_requests/{116-Use Q6_0 instead of Q5_1 for tensors incompatible with IQ5_K_Q5_K.md => 116 - Use Q6_0 instead of Q5_1 for tensors incompatible with IQ5_K_Q5_K.md} (100%) rename github-data/pull_requests/{117-Some minor quant strategies tweaks.md => 117 - Some minor quant strategies tweaks.md} (100%) rename github-data/pull_requests/{118-IQ4_NL_X4.md => 118 - IQ4_NL_X4.md} (100%) rename github-data/pull_requests/{119-Q4_0_R4.md => 119 - Q4_0_R4.md} (100%) rename github-data/pull_requests/{12-q2_K_ allow it to detect ternary nets and quantize accordingly.md => 12 - q2_K_ allow it to detect ternary nets and quantize accordingly.md} (100%) rename github-data/pull_requests/{120-Q8_0_R4.md => 120 - Q8_0_R4.md} (100%) rename github-data/pull_requests/{121-Q5_0_R4.md => 121 - Q5_0_R4.md} (100%) rename github-data/pull_requests/{122-Q6_0_R4.md => 122 - Q6_0_R4.md} (100%) rename github-data/pull_requests/{123-IQ4_XS_R4.md => 123 - IQ4_XS_R4.md} (100%) rename github-data/pull_requests/{124-iq2_bn_r4_ fastest Bitnet CPU implementation on the planet.md => 124 - iq2_bn_r4_ fastest Bitnet CPU implementation on the planet.md} (100%) rename github-data/pull_requests/{125-R4 improvements on ARM_NEON.md => 125 - R4 improvements on ARM_NEON.md} (100%) rename github-data/pull_requests/{126-Rename iq4_nl_x4 to iq4_nl_r4.md => 126 - Rename iq4_nl_x4 to iq4_nl_r4.md} (100%) rename github-data/pull_requests/{127-Q4_0_R4 on CUDA.md => 127 - Q4_0_R4 on CUDA.md} (100%) rename github-data/pull_requests/{128-Faster IQ4_XS_R4 on Zen4.md => 128 - Faster IQ4_XS_R4 on Zen4.md} (100%) rename github-data/pull_requests/{129-Q4_K_R4.md => 129 - Q4_K_R4.md} (100%) rename github-data/pull_requests/{13-Adding IQ2_TN for use with ternary models.md => 13 - Adding IQ2_TN for use with ternary models.md} (100%) rename github-data/pull_requests/{130-Q6_K_R4.md => 130 - Q6_K_R4.md} (100%) rename github-data/pull_requests/{131-Slightly faster Q4_K_R4 and IQ4_XS_R4 on Zen4.md => 131 - Slightly faster Q4_K_R4 and IQ4_XS_R4 on Zen4.md} (100%) rename github-data/pull_requests/{132-Q5_K_R4.md => 132 - Q5_K_R4.md} (100%) rename github-data/pull_requests/{134-Q3_K_R4.md => 134 - Q3_K_R4.md} (100%) rename github-data/pull_requests/{135-Better ARM_NEON implementation for R4 quants.md => 135 - Better ARM_NEON implementation for R4 quants.md} (100%) rename github-data/pull_requests/{136-Q2_K_R4.md => 136 - Q2_K_R4.md} (100%) rename github-data/pull_requests/{137-Fix AVX2 implementation of iq4_nl_r4.md => 137 - Fix AVX2 implementation of iq4_nl_r4.md} (100%) rename github-data/pull_requests/{138-IQ4_K_R4.md => 138 - IQ4_K_R4.md} (100%) rename github-data/pull_requests/{139-Faster R4 quants on Zen4.md => 139 - Faster R4 quants on Zen4.md} (100%) rename github-data/pull_requests/{14-Adding IQ6_K.md => 14 - Adding IQ6_K.md} (100%) rename github-data/pull_requests/{141-Q8_K_R8_ Fastest quantized matrix multiplications.md => 141 - Q8_K_R8_ Fastest quantized matrix multiplications.md} (100%) rename github-data/pull_requests/{142-BF16_R16 - 16 interleaved bf16 rows .md => 142 - BF16_R16 - 16 interleaved bf16 rows.md} (100%) rename github-data/pull_requests/{143-Slightly faster IQ4_XS_R4 on AVX2.md => 143 - Slightly faster IQ4_XS_R4 on AVX2.md} (100%) rename github-data/pull_requests/{144-Slightly faster IQ4_K_R4 on AVX2_Zen4.md => 144 - Slightly faster IQ4_K_R4 on AVX2_Zen4.md} (100%) rename github-data/pull_requests/{145-IQ3_K_R4.md => 145 - IQ3_K_R4.md} (100%) rename github-data/pull_requests/{146-IQ2_K_R4.md => 146 - IQ2_K_R4.md} (100%) rename github-data/pull_requests/{147-Be able to repack tensors at run time.md => 147 - Be able to repack tensors at run time.md} (100%) rename github-data/pull_requests/{148-Slightly better matrix x vector on Zen4_AVX2 for iq2_k_r4, iq3_k_r4, iq4_k_r4.md => 148 - Slightly better matrix x vector on Zen4_AVX2 for iq2_k_r4_ iq3_k_r4_ iq.md} (100%) rename github-data/pull_requests/{149-IQ5_K_R4.md => 149 - IQ5_K_R4.md} (100%) rename github-data/pull_requests/{150-IQ4_KS_R4.md => 150 - IQ4_KS_R4.md} (100%) rename github-data/pull_requests/{151-fix typo.md => 151 - fix typo.md} (100%) rename github-data/pull_requests/{152-IQ3_XXS_R4.md => 152 - IQ3_XXS_R4.md} (100%) rename github-data/pull_requests/{153-IQ3_XXS_R4.md => 153 - IQ3_XXS_R4.md} (100%) rename github-data/pull_requests/{154-IQ2_XXS_R4.md => 154 - IQ2_XXS_R4.md} (100%) rename github-data/pull_requests/{155-IQ2_XS_R4.md => 155 - IQ2_XS_R4.md} (100%) rename github-data/pull_requests/{156-IQ2_S_R4.md => 156 - IQ2_S_R4.md} (100%) rename github-data/pull_requests/{157-R4 i-quants improvements.md => 157 - R4 i-quants improvements.md} (100%) rename github-data/pull_requests/{158-Faster R4 legacy quants.md => 158 - Faster R4 legacy quants.md} (100%) rename github-data/pull_requests/{16-Fix Makefile.md => 16 - Fix Makefile.md} (100%) rename github-data/pull_requests/{161-MSVC fixes.md => 161 - MSVC fixes.md} (100%) rename github-data/pull_requests/{162-IQ3_S_R4.md => 162 - IQ3_S_R4.md} (100%) rename github-data/pull_requests/{163-q4_0_r4_ Use AVX2 version for matrix x vector.md => 163 - q4_0_r4_ Use AVX2 version for matrix x vector.md} (100%) rename github-data/pull_requests/{168-Falcon3 changes.md => 168 - Falcon3 changes.md} (100%) rename github-data/pull_requests/{169-Be able to re-quantize MS BitNet I2_S models.md => 169 - Be able to re-quantize MS BitNet I2_S models.md} (100%) rename github-data/pull_requests/{17-Merge mainline - Aug 12 2024.md => 17 - Merge mainline - Aug 12 2024.md} (100%) rename github-data/pull_requests/{170-MoE fix for R4 quants.md => 170 - MoE fix for R4 quants.md} (100%) rename github-data/pull_requests/{171-Fix lower FA performance for even batch sizes.md => 171 - Fix lower FA performance for even batch sizes.md} (100%) rename github-data/pull_requests/{172-CPU Flash Attention improvements.md => 172 - CPU Flash Attention improvements.md} (100%) rename github-data/pull_requests/{173-More Flash Attention improvements.md => 173 - More Flash Attention improvements.md} (100%) rename github-data/pull_requests/{174-On Zen4 repack fp16 models to bf16_r16.md => 174 - On Zen4 repack fp16 models to bf16_r16.md} (100%) rename github-data/pull_requests/{175-Better BF16 support on AVX2.md => 175 - Better BF16 support on AVX2.md} (100%) rename github-data/pull_requests/{176-Deepseek V3 support added.md => 176 - Deepseek V3 support added.md} (100%) rename github-data/pull_requests/{177-Update chat templates.md => 177 - Update chat templates.md} (100%) rename github-data/pull_requests/{178-Interleave 8 rows (Q8_0, IQ4_XS).md => 178 - Interleave 8 rows _Q8_0_ IQ4_XS_.md} (100%) rename github-data/pull_requests/{179-Minor performance improvements.md => 179 - Minor performance improvements.md} (100%) rename github-data/pull_requests/{180-Deepseek MLA Optimizations.md => 180 - Deepseek MLA Optimizations.md} (100%) rename github-data/pull_requests/{181-Various.md => 181 - Various.md} (100%) rename github-data/pull_requests/{182-Faster Q4_K_R4 and Q5_K_R4 on AVX2_Zen4.md => 182 - Faster Q4_K_R4 and Q5_K_R4 on AVX2_Zen4.md} (100%) rename github-data/pull_requests/{184-Deepseek-Lite.md => 184 - Deepseek-Lite.md} (100%) rename github-data/pull_requests/{185-IQ1_S_R4_ better 1.5 bpw quants.md => 185 - IQ1_S_R4_ better 1.5 bpw quants.md} (100%) rename github-data/pull_requests/{186-iq1_s_r4_ slightly faster NEON gemm_gemv.md => 186 - iq1_s_r4_ slightly faster NEON gemm_gemv.md} (100%) rename github-data/pull_requests/{187-IQ1_M_R4_ better 1.75 bpw quants.md => 187 - IQ1_M_R4_ better 1.75 bpw quants.md} (100%) rename github-data/pull_requests/{188-Add optional MLA.md => 188 - Add optional MLA.md} (100%) rename github-data/pull_requests/{189-Rename q4_0_r4, q8_0_r4 and iq4_xs_r4 to _r8.md => 189 - Rename q4_0_r4_ q8_0_r4 and iq4_xs_r4 to _r8.md} (100%) rename github-data/pull_requests/{19-Skip barriers of noops.md => 19 - Skip barriers of noops.md} (100%) rename github-data/pull_requests/{190-cuda_ non-contiguous rms norm.md => 190 - cuda_ non-contiguous rms norm.md} (100%) rename github-data/pull_requests/{191-Add additional checks for iq1_s_r4 quantization.md => 191 - Add additional checks for iq1_s_r4 quantization.md} (100%) rename github-data/pull_requests/{192-Revert #79.md => 192 - Revert _79.md} (100%) rename github-data/pull_requests/{193-RPC sync.md => 193 - RPC sync.md} (100%) rename github-data/pull_requests/{194-Use Q8_K_128 for IQ1_S_R4 and IQ1_M_R4 matrix multiplications.md => 194 - Use Q8_K_128 for IQ1_S_R4 and IQ1_M_R4 matrix multiplications.md} (100%) rename github-data/pull_requests/{195- Deepseek MLA Optimizations V2.md => 195 - Deepseek MLA Optimizations V2.md} (100%) rename github-data/pull_requests/{197-FA_ Add option to build all FA kernels.md => 197 - FA_ Add option to build all FA kernels.md} (100%) rename github-data/pull_requests/{198- Load all MoE experts during warmup and make warmup 1 token.md => 198 - Load all MoE experts during warmup and make warmup 1 token.md} (100%) rename github-data/pull_requests/{2-Offload Bitnet token embeddings to the GPU - the right way.md => 2 - Offload Bitnet token embeddings to the GPU - the right way.md} (100%) rename github-data/pull_requests/{20-iq2_k_ slightly better bpw - accuracy compromise.md => 20 - iq2_k_ slightly better bpw - accuracy compromise.md} (100%) rename github-data/pull_requests/{200-DeepSeek FA support (CPU only).md => 200 - DeepSeek FA support _CPU only_.md} (100%) rename github-data/pull_requests/{202-Fix imatrix overprotectiveness.md => 202 - Fix imatrix overprotectiveness.md} (100%) rename github-data/pull_requests/{204-Fix iqk_mul_mat on AVX512 systems that are missing BF16 support.md => 204 - Fix iqk_mul_mat on AVX512 systems that are missing BF16 support.md} (100%) rename github-data/pull_requests/{205-Faster MLA prompt processing.md => 205 - Faster MLA prompt processing.md} (100%) rename github-data/pull_requests/{206-MLA_ allow Q8_0 K-cache for MLA.md => 206 - MLA_ allow Q8_0 K-cache for MLA.md} (100%) rename github-data/pull_requests/{207-Faster CPU TG for GQA models.md => 207 - Faster CPU TG for GQA models.md} (100%) rename github-data/pull_requests/{208-Q8_KV_ 8-bit quantization type targeting the KV cache.md => 208 - Q8_KV_ 8-bit quantization type targeting the KV cache.md} (100%) rename github-data/pull_requests/{21-quantize_stats_ print rmse and max error as fraction of _x_.md => 21 - quantize_stats_ print rmse and max error as fraction of _x_.md} (100%) rename github-data/pull_requests/{210-Repack also experts.md => 210 - Repack also experts.md} (100%) rename github-data/pull_requests/{212-Optimized GEMM_GEMV for IQ1_S.md => 212 - Optimized GEMM_GEMV for IQ1_S.md} (100%) rename github-data/pull_requests/{213-Fix NEON gemm_gemv for legacy quants when row size is not divisible by 128.md => 213 - Fix NEON gemm_gemv for legacy quants when row size is not divisible by .md} (100%) rename github-data/pull_requests/{215-Trying to fix confusion betweem HAVE_FANCY_SIMD and AVX512.md => 215 - Trying to fix confusion betweem HAVE_FANCY_SIMD and AVX512.md} (100%) rename github-data/pull_requests/{216-Hopefully this really fixes the confusion between AVX512 and FANCY_SIMD.md => 216 - Hopefully this really fixes the confusion between AVX512 and FANCY_SIMD.md} (100%) rename github-data/pull_requests/{218-Better strategy for attention matrix multiplications when generating tokens .md => 218 - Better strategy for attention matrix multiplications when generating to.md} (100%) rename github-data/pull_requests/{219-Fuse MoE up and gate matrix multiplications.md => 219 - Fuse MoE up and gate matrix multiplications.md} (100%) rename github-data/pull_requests/{22-AVX2 quantization for Q8_K.md => 22 - AVX2 quantization for Q8_K.md} (100%) rename github-data/pull_requests/{220-Fix #217.md => 220 - Fix _217.md} (100%) rename github-data/pull_requests/{225- Examples _ Add new sweep-bench benchmark.md => 225 - Examples _ Add new sweep-bench benchmark.md} (100%) rename github-data/pull_requests/{226-Fix compilation error with IQK_FA_ALL_QUANTS enabled.md => 226 - Fix compilation error with IQK_FA_ALL_QUANTS enabled.md} (100%) rename github-data/pull_requests/{229-Fused MoE ffn_up and ffn_gate.md => 229 - Fused MoE ffn_up and ffn_gate.md} (100%) rename github-data/pull_requests/{23-iq4_k tweak.md => 23 - iq4_k tweak.md} (100%) rename github-data/pull_requests/{231-Fix #230.md => 231 - Fix _230.md} (100%) rename github-data/pull_requests/{232-Give the user the option to override where model weights are stored.md => 232 - Give the user the option to override where model weights are stored.md} (100%) rename github-data/pull_requests/{233-Slightly faster CUDA MLA.md => 233 - Slightly faster CUDA MLA.md} (100%) rename github-data/pull_requests/{234-Faster MLA on CUDA.md => 234 - Faster MLA on CUDA.md} (100%) rename github-data/pull_requests/{235-Option to use MLA without a transposed cache.md => 235 - Option to use MLA without a transposed cache.md} (100%) rename github-data/pull_requests/{236-Feat_lock free server.md => 236 - Feat_lock free server.md} (100%) rename github-data/pull_requests/{237-Reduce size of compute buffers.md => 237 - Reduce size of compute buffers.md} (100%) rename github-data/pull_requests/{238-A better way to measure the cost of ggml_barrier.md => 238 - A better way to measure the cost of ggml_barrier.md} (100%) rename github-data/pull_requests/{239-SER - Smart Expert Reduction.md => 239 - SER - Smart Expert Reduction.md} (100%) rename github-data/pull_requests/{24-softcap_ minor improvement.md => 24 - softcap_ minor improvement.md} (100%) rename github-data/pull_requests/{240-Flash MLA (CPU only).md => 240 - Flash MLA _CPU only_.md} (100%) rename github-data/pull_requests/{241-DeepSeek CUDA Flash Attention .md => 241 - DeepSeek CUDA Flash Attention.md} (100%) rename github-data/pull_requests/{243-Better FlashMLA.md => 243 - Better FlashMLA.md} (100%) rename github-data/pull_requests/{244-Custom quantization rules with regular expressions.md => 244 - Custom quantization rules with regular expressions.md} (100%) rename github-data/pull_requests/{246-Faster FlashMLA prompt processing.md => 246 - Faster FlashMLA prompt processing.md} (100%) rename github-data/pull_requests/{247-FlashMLA on CUDA.md => 247 - FlashMLA on CUDA.md} (100%) rename github-data/pull_requests/{248-Faster MoE token generation on CUDA.md => 248 - Faster MoE token generation on CUDA.md} (100%) rename github-data/pull_requests/{250-DeepSeek imatrix stuff.md => 250 - DeepSeek imatrix stuff.md} (100%) rename github-data/pull_requests/{251-Try using fp32 for FlashMLA.md => 251 - Try using fp32 for FlashMLA.md} (100%) rename github-data/pull_requests/{252-MLA-2_ Allow usage of q8_0 for KV cache on CUDA.md => 252 - MLA-2_ Allow usage of q8_0 for KV cache on CUDA.md} (100%) rename github-data/pull_requests/{253-FlashMLA-2 (CPU)_ faster and smaller compute buffer size.md => 253 - FlashMLA-2 _CPU_ faster and smaller compute buffer size.md} (100%) rename github-data/pull_requests/{259-Prepare wk_b tensors of DeepSeek models on the fly.md => 259 - Prepare wk_b tensors of DeepSeek models on the fly.md} (100%) rename github-data/pull_requests/{260-FlashMLA-2_ reduce compute buffer size (CUDA and CPU).md => 260 - FlashMLA-2_ reduce compute buffer size _CUDA and CPU_.md} (100%) rename github-data/pull_requests/{261-Compile time option to use bf16 for quants without MMQ kernels.md => 261 - Compile time option to use bf16 for quants without MMQ kernels.md} (100%) rename github-data/pull_requests/{262-Fix #261.md => 262 - Fix _261.md} (100%) rename github-data/pull_requests/{264-Make Q8_0 KV cache work with FlasMLA-2 on CUDA.md => 264 - Make Q8_0 KV cache work with FlasMLA-2 on CUDA.md} (100%) rename github-data/pull_requests/{265-Allow q8_0 cache on the CPU for FlashMLA-2.md => 265 - Allow q8_0 cache on the CPU for FlashMLA-2.md} (100%) rename github-data/pull_requests/{268-Prevent FlashMLA-1 from running on CUDA.md => 268 - Prevent FlashMLA-1 from running on CUDA.md} (100%) rename github-data/pull_requests/{269-Fix ggml_compute_forward_dup_q.md => 269 - Fix ggml_compute_forward_dup_q.md} (100%) rename github-data/pull_requests/{27-Faster Gemma2.md => 27 - Faster Gemma2.md} (100%) rename github-data/pull_requests/{270-Honor mmap setting when using tensor overrides.md => 270 - Honor mmap setting when using tensor overrides.md} (100%) rename github-data/pull_requests/{272-Convert models to row-interleaved quants using the quantize tool.md => 272 - Convert models to row-interleaved quants using the quantize tool.md} (100%) rename github-data/pull_requests/{273-FlashMLA-3_ the best of both worlds (CPU only).md => 273 - FlashMLA-3_ the best of both worlds _CPU only_.md} (100%) rename github-data/pull_requests/{274-Specify tensor name regex for tensors to be repacked.md => 274 - Specify tensor name regex for tensors to be repacked.md} (100%) rename github-data/pull_requests/{275-Fix bug_ missing parentheses in logical expression.md => 275 - Fix bug_ missing parentheses in logical expression.md} (100%) rename github-data/pull_requests/{276-Add Gemma3 support (text only).md => 276 - Add Gemma3 support _text only_.md} (100%) rename github-data/pull_requests/{277-Attempt to improve FlashMLA on the CPU.md => 277 - Attempt to improve FlashMLA on the CPU.md} (100%) rename github-data/pull_requests/{278-Test transparent huge pages on Linux.md => 278 - Test transparent huge pages on Linux.md} (100%) rename github-data/pull_requests/{279-Fighting with cmake.md => 279 - Fighting with cmake.md} (100%) rename github-data/pull_requests/{28-Binary KQ mask.md => 28 - Binary KQ mask.md} (100%) rename github-data/pull_requests/{280-Native build ooption for CUDA when GGML_NATIVE is set.md => 280 - Native build ooption for CUDA when GGML_NATIVE is set.md} (100%) rename github-data/pull_requests/{282-Improve DeepSeek batched processing speed.md => 282 - Improve DeepSeek batched processing speed.md} (100%) rename github-data/pull_requests/{283-CUDA_ better MoE implementation.md => 283 - CUDA_ better MoE implementation.md} (100%) rename github-data/pull_requests/{284-llama-bench_ enable having different number of threads for tg and pp.md => 284 - llama-bench_ enable having different number of threads for tg and pp.md} (100%) rename github-data/pull_requests/{287-Is this better for DeepSeek-R1_.md => 287 - Is this better for DeepSeek-R1_.md} (100%) rename github-data/pull_requests/{289-Update sweep bench (depracating .jsonl support).md => 289 - Update sweep bench _depracating .jsonl support_.md} (100%) rename github-data/pull_requests/{290-mmap backed KV cache.md => 290 - mmap backed KV cache.md} (100%) rename github-data/pull_requests/{291-Disable Zen4 optimizations for Q8_0_Q8_0_R8.md => 291 - Disable Zen4 optimizations for Q8_0_Q8_0_R8.md} (100%) rename github-data/pull_requests/{292-Use bf16 instead of fp16 block scales for q8_1.md => 292 - Use bf16 instead of fp16 block scales for q8_1.md} (100%) rename github-data/pull_requests/{294-Make sure tensor row size is multiple of block size also when quantizing with --pure.md => 294 - Make sure tensor row size is multiple of block size also when quantizin.md} (100%) rename github-data/pull_requests/{295-Quantization improvements.md => 295 - Quantization improvements.md} (100%) rename github-data/pull_requests/{298-Update gguf-py constants.md => 298 - Update gguf-py constants.md} (100%) rename github-data/pull_requests/{299-Additional guards for interleaved quants.md => 299 - Additional guards for interleaved quants.md} (100%) rename github-data/pull_requests/{3-Merge mainline llama.cpp.md => 3 - Merge mainline llama.cpp.md} (100%) rename github-data/pull_requests/{301-Fix #300.md => 301 - Fix _300.md} (100%) rename github-data/pull_requests/{302-Quantization improvements (2).md => 302 - Quantization improvements _2_.md} (100%) rename github-data/pull_requests/{303-Fix ARM_NEON build failure due to q8_2.md => 303 - Fix ARM_NEON build failure due to q8_2.md} (100%) rename github-data/pull_requests/{307-Metal_ much faster MoE prompt processing.md => 307 - Metal_ much faster MoE prompt processing.md} (100%) rename github-data/pull_requests/{309-Fix GCC compilation errors on ARM.md => 309 - Fix GCC compilation errors on ARM.md} (100%) rename github-data/pull_requests/{31-Fix build when iqk_mul_mat is disabled.md => 31 - Fix build when iqk_mul_mat is disabled.md} (100%) rename github-data/pull_requests/{310-Metal_ FA and FlashMLA.md => 310 - Metal_ FA and FlashMLA.md} (100%) rename github-data/pull_requests/{311-Add -flax-vector-conversions for GCC on ARM.md => 311 - Add -flax-vector-conversions for GCC on ARM.md} (100%) rename github-data/pull_requests/{312-Improved IQ2_XS quantization.md => 312 - Improved IQ2_XS quantization.md} (100%) rename github-data/pull_requests/{313-We need to synchronize before using device to host async memcpy.md => 313 - We need to synchronize before using device to host async memcpy.md} (100%) rename github-data/pull_requests/{315-Try not repacking q8_0 for FA computations.md => 315 - Try not repacking q8_0 for FA computations.md} (100%) rename github-data/pull_requests/{317-Add copyright notices.md => 317 - Add copyright notices.md} (100%) rename github-data/pull_requests/{318-Use links for ggml_llama.cpp authors.md => 318 - Use links for ggml_llama.cpp authors.md} (100%) rename github-data/pull_requests/{32-Zen4 Flash Attention.md => 32 - Zen4 Flash Attention.md} (100%) rename github-data/pull_requests/{320-Guard against attempts to use MLA for non-MLA models.md => 320 - Guard against attempts to use MLA for non-MLA models.md} (100%) rename github-data/pull_requests/{321-LlaMA-4 support (text only).md => 321 - LlaMA-4 support _text only_.md} (100%) rename github-data/pull_requests/{324-Correct L4 rms_norm.md => 324 - Correct L4 rms_norm.md} (100%) rename github-data/pull_requests/{325-Fix KLD precision.md => 325 - Fix KLD precision.md} (100%) rename github-data/pull_requests/{326-WIP Compute per layer LIM Scores during imatrix.md => 326 - WIP Compute per layer LIM Scores during imatrix.md} (100%) rename github-data/pull_requests/{327-Improved IQ1_M quantization.md => 327 - Improved IQ1_M quantization.md} (100%) rename github-data/pull_requests/{328-imatrix_ collect layer influence statistics.md => 328 - imatrix_ collect layer influence statistics.md} (100%) rename github-data/pull_requests/{329-Add ability to hide imatrix details in llama-quantize.md => 329 - Add ability to hide imatrix details in llama-quantize.md} (100%) rename github-data/pull_requests/{33-Do not process prompts containing binary data for escapes.md => 33 - Do not process prompts containing binary data for escapes.md} (100%) rename github-data/pull_requests/{330-Allow q8_0 KV cache for head size 256.md => 330 - Allow q8_0 KV cache for head size 256.md} (100%) rename github-data/pull_requests/{331-Better gemm_gemv on AVX2 fr q4_0_r8.md => 331 - Better gemm_gemv on AVX2 fr q4_0_r8.md} (100%) rename github-data/pull_requests/{332-Better TG performance for GQA models (CPU).md => 332 - Better TG performance for GQA models _CPU_.md} (100%) rename github-data/pull_requests/{333-Support GLM-4-0414 models based on piDack's mainline PR.md => 333 - Support GLM-4-0414 models based on piDack_s mainline PR.md} (100%) rename github-data/pull_requests/{336-Fix termux_android build.md => 336 - Fix termux_android build.md} (100%) rename github-data/pull_requests/{337-Add support for bitnet2b_2501 model.md => 337 - Add support for bitnet2b_2501 model.md} (100%) rename github-data/pull_requests/{338-BitNet adjustments.md => 338 - BitNet adjustments.md} (100%) rename github-data/pull_requests/{341-Add support for Cohere2.md => 341 - Add support for Cohere2.md} (100%) rename github-data/pull_requests/{342-Fix LLaMA-4 attention.md => 342 - Fix LLaMA-4 attention.md} (100%) rename github-data/pull_requests/{343-cuda_ use switch in constexpr funcs.md => 343 - cuda_ use switch in constexpr funcs.md} (100%) rename github-data/pull_requests/{344-Add GLM-4-0414 Model Support.md => 344 - Add GLM-4-0414 Model Support.md} (100%) rename github-data/pull_requests/{346-Fix FA on ARM CPUs.md => 346 - Fix FA on ARM CPUs.md} (100%) rename github-data/pull_requests/{347-Add ability to manually set arch flags.md => 347 - Add ability to manually set arch flags.md} (100%) rename github-data/pull_requests/{348-Fix q4_1 and q5_1 on Arm.md => 348 - Fix q4_1 and q5_1 on Arm.md} (100%) rename github-data/pull_requests/{349-Fix division by zero bug.md => 349 - Fix division by zero bug.md} (100%) rename github-data/pull_requests/{35-Fix Zen4 Flash Attention.md => 35 - Fix Zen4 Flash Attention.md} (100%) rename github-data/pull_requests/{351-CPU FA improvements .md => 351 - CPU FA improvements.md} (100%) rename github-data/pull_requests/{352-Update README.md.md => 352 - Update README.md.md} (100%) rename github-data/pull_requests/{355-Apply Qwen3 PR from llama.cpp.md => 355 - Apply Qwen3 PR from llama.cpp.md} (100%) rename github-data/pull_requests/{356-Add missing enum values for qwen3 and qwen3moe.md => 356 - Add missing enum values for qwen3 and qwen3moe.md} (100%) rename github-data/pull_requests/{36-Zen4 Flash Attnetion 2.md => 36 - Zen4 Flash Attnetion 2.md} (100%) rename github-data/pull_requests/{360-Fix IQK_FA_ALL_QUANTS on AVX2.md => 360 - Fix IQK_FA_ALL_QUANTS on AVX2.md} (100%) rename github-data/pull_requests/{364-Fix FA bug on AVX2.md => 364 - Fix FA bug on AVX2.md} (100%) rename github-data/pull_requests/{366-Add support for new Bitnet model architecture name.md => 366 - Add support for new Bitnet model architecture name.md} (100%) rename github-data/pull_requests/{368-Trying to fix iq1_s_r4_iq1_m_r4 quantization failure.md => 368 - Trying to fix iq1_s_r4_iq1_m_r4 quantization failure.md} (100%) rename github-data/pull_requests/{369-cmake_ force MSVC compiler charset to utf-8.md => 369 - cmake_ force MSVC compiler charset to utf-8.md} (100%) rename github-data/pull_requests/{37-Performance improvements for legacy quants on ARM_NEON.md => 37 - Performance improvements for legacy quants on ARM_NEON.md} (100%) rename github-data/pull_requests/{370-CUDA_ faster FA TG for GQA models .md => 370 - CUDA_ faster FA TG for GQA models.md} (100%) rename github-data/pull_requests/{371-Another attempt to fix #367.md => 371 - Another attempt to fix _367.md} (100%) rename github-data/pull_requests/{374-CUDA_ MMQ for IQ4_KS.md => 374 - CUDA_ MMQ for IQ4_KS.md} (100%) rename github-data/pull_requests/{375-Add batch warmup to sweep-bench.md => 375 - Add batch warmup to sweep-bench.md} (100%) rename github-data/pull_requests/{377-Support for Llama-3-Nemotron models.md => 377 - Support for Llama-3-Nemotron models.md} (100%) rename github-data/pull_requests/{38-Zen4 Flash Attention - bf16 support.md => 38 - Zen4 Flash Attention - bf16 support.md} (100%) rename github-data/pull_requests/{382-Fix DeepSeek FA.md => 382 - Fix DeepSeek FA.md} (100%) rename github-data/pull_requests/{386-FlashMLA-3 for DeepSeek models on CUDA.md => 386 - FlashMLA-3 for DeepSeek models on CUDA.md} (100%) rename github-data/pull_requests/{39-Add support for bf16 to iqk_mul_mat.md => 39 - Add support for bf16 to iqk_mul_mat.md} (100%) rename github-data/pull_requests/{390-Fix build for Xeon Gold 6226R.md => 390 - Fix build for Xeon Gold 6226R.md} (100%) rename github-data/pull_requests/{391-Fix DeepSeek q8_0 cache.md => 391 - Fix DeepSeek q8_0 cache.md} (100%) rename github-data/pull_requests/{392-fix some MSVC build problem..md => 392 - fix some MSVC build problem..md} (100%) rename github-data/pull_requests/{394-Handle incompatible DeepSeek GGUFs.md => 394 - Handle incompatible DeepSeek GGUFs.md} (100%) rename github-data/pull_requests/{4-Simdify and multi-thread tanh.md => 4 - Simdify and multi-thread tanh.md} (100%) rename github-data/pull_requests/{40-Adding bf16 support to CUDA.md => 40 - Adding bf16 support to CUDA.md} (100%) rename github-data/pull_requests/{400-Fix CUDA DeepSeek FlashMLA-3 with quantized KV cache.md => 400 - Fix CUDA DeepSeek FlashMLA-3 with quantized KV cache.md} (100%) rename github-data/pull_requests/{402-Fix missing rope_freqs with convert_hf_to_gguf.md => 402 - Fix missing rope_freqs with convert_hf_to_gguf.md} (100%) rename github-data/pull_requests/{404-TG improvements for MoE models.md => 404 - TG improvements for MoE models.md} (100%) rename github-data/pull_requests/{405-GPU offload policy.md => 405 - GPU offload policy.md} (100%) rename github-data/pull_requests/{406-Fix race in the CUDA DeepSeek FA kernel.md => 406 - Fix race in the CUDA DeepSeek FA kernel.md} (100%) rename github-data/pull_requests/{408-Faster DeepSeek FA on CUDA.md => 408 - Faster DeepSeek FA on CUDA.md} (100%) rename github-data/pull_requests/{409-Enable faster prompt processing with mainline llama.cpp GGUFs.md => 409 - Enable faster prompt processing with mainline llama.cpp GGUFs.md} (100%) rename github-data/pull_requests/{41-iqk_mul_mat(ARM_NEON)_ adding bf16 support.md => 41 - iqk_mul_mat_ARM_NEON_ adding bf16 support.md} (100%) rename github-data/pull_requests/{410-Better CPU FA performance for DeepSeek-Lite.md => 410 - Better CPU FA performance for DeepSeek-Lite.md} (100%) rename github-data/pull_requests/{411-Fix imatrix calculation for MLA models.md => 411 - Fix imatrix calculation for MLA models.md} (100%) rename github-data/pull_requests/{413-Fix new CUDA FA on Touring.md => 413 - Fix new CUDA FA on Touring.md} (100%) rename github-data/pull_requests/{415-Fix SER (CPU).md => 415 - Fix SER _CPU_.md} (100%) rename github-data/pull_requests/{416-Fix SER (CUDA).md => 416 - Fix SER _CUDA_.md} (100%) rename github-data/pull_requests/{417-CUDA_ quantized GEMM for for IQ4_K, IQ5_K, IQ6_K .md => 417 - CUDA_ quantized GEMM for for IQ4_K_ IQ5_K_ IQ6_K.md} (100%) rename github-data/pull_requests/{418-CUDA_ quantized GEMM for for IQ2_KS, IQ2_K, IQ3_K.md => 418 - CUDA_ quantized GEMM for for IQ2_KS_ IQ2_K_ IQ3_K.md} (100%) rename github-data/pull_requests/{42-Adding fused rms_norm.md => 42 - Adding fused rms_norm.md} (100%) rename github-data/pull_requests/{421-Fix standard attention on the CPU.md => 421 - Fix standard attention on the CPU.md} (100%) rename github-data/pull_requests/{422-Adding IQ5_KS - 5.25 bpw quants.md => 422 - Adding IQ5_KS - 5.25 bpw quants.md} (100%) rename github-data/pull_requests/{424-Adding forgotten template instance for iq5_ks.md => 424 - Adding forgotten template instance for iq5_ks.md} (100%) rename github-data/pull_requests/{426-IQ5_KS_R4_ row-interleaved IQ5_KS.md => 426 - IQ5_KS_R4_ row-interleaved IQ5_KS.md} (100%) rename github-data/pull_requests/{427-Fix AVX2 implementation of IQ4_K, IQ4_KS, IQ5_K, IQ6_K.md => 427 - Fix AVX2 implementation of IQ4_K_ IQ4_KS_ IQ5_K_ IQ6_K.md} (100%) rename github-data/pull_requests/{428-Zen4_ Faster PP for IQ2_KS, IQ4_KS, IQ5_KS.md => 428 - Zen4_ Faster PP for IQ2_KS_ IQ4_KS_ IQ5_KS.md} (100%) rename github-data/pull_requests/{429-Option to enable or disable the CPU FA kernels.md => 429 - Option to enable or disable the CPU FA kernels.md} (100%) rename github-data/pull_requests/{43-iq2_tn_ slightly faster PP on Zen4.md => 43 - iq2_tn_ slightly faster PP on Zen4.md} (100%) rename github-data/pull_requests/{430-Disable multi-add for now.md => 430 - Disable multi-add for now.md} (100%) rename github-data/pull_requests/{431-Forgotten MMQ ref and typo.md => 431 - Forgotten MMQ ref and typo.md} (100%) rename github-data/pull_requests/{435-Refactor iqk_mul_mat.cpp.md => 435 - Refactor iqk_mul_mat.cpp.md} (100%) rename github-data/pull_requests/{438-Another attempt to fix the illegal memory access bug.md => 438 - Another attempt to fix the illegal memory access bug.md} (100%) rename github-data/pull_requests/{439-Bug fixes from mainline.md => 439 - Bug fixes from mainline.md} (100%) rename github-data/pull_requests/{44-Adding IQ1_TN - 1.6875 bpw for TriLM ternary models.md => 44 - Adding IQ1_TN - 1.6875 bpw for TriLM ternary models.md} (100%) rename github-data/pull_requests/{441-Trellis quants with CPU inference.md => 441 - Trellis quants with CPU inference.md} (100%) rename github-data/pull_requests/{442-CUDA call tracer.md => 442 - CUDA call tracer.md} (100%) rename github-data/pull_requests/{443-Streamline a bit the quant strategies.md => 443 - Streamline a bit the quant strategies.md} (100%) rename github-data/pull_requests/{444-gguf-split _ update.md => 444 - gguf-split _ update.md} (100%) rename github-data/pull_requests/{445-Fix typo in non-AVX2 code branch.md => 445 - Fix typo in non-AVX2 code branch.md} (100%) rename github-data/pull_requests/{446-Fix bug in MMVQ kernel.md => 446 - Fix bug in MMVQ kernel.md} (100%) rename github-data/pull_requests/{448-Fix MSVC compilation.md => 448 - Fix MSVC compilation.md} (100%) rename github-data/pull_requests/{449-Legacy quants conversion schemes in convert_hf_to_gguf.py.md => 449 - Legacy quants conversion schemes in convert_hf_to_gguf.py.md} (100%) rename github-data/pull_requests/{45-Add CUDA support for IQ1_TN.md => 45 - Add CUDA support for IQ1_TN.md} (100%) rename github-data/pull_requests/{453-Faster IQ3_KT and IQ4_KT.md => 453 - Faster IQ3_KT and IQ4_KT.md} (100%) rename github-data/pull_requests/{454-Add support for FP8 GGUF creation and re-quantization (WIP).md => 454 - Add support for FP8 GGUF creation and re-quantization _WIP_.md} (100%) rename github-data/pull_requests/{457-Remove GGML_IQK_MUL_MAT option.md => 457 - Remove GGML_IQK_MUL_MAT option.md} (100%) rename github-data/pull_requests/{458-Add missing gguf-py constants.md => 458 - Add missing gguf-py constants.md} (100%) rename github-data/pull_requests/{46-IQ1_TN Metal implementation.md => 46 - IQ1_TN Metal implementation.md} (100%) rename github-data/pull_requests/{460-aarch64 kernels for KT quants.md => 460 - aarch64 kernels for KT quants.md} (100%) rename github-data/pull_requests/{461-CUDA implementation for IQ2_K_R4, IQ3_K_R4, IQ4_K_R4, IQ5_K_R4.md => 461 - CUDA implementation for IQ2_K_R4_ IQ3_K_R4_ IQ4_K_R4_ IQ5_K_R4.md} (100%) rename github-data/pull_requests/{462-CUDA GEMM and GEMV for IQ4_KS_R4 and IQ5_KS_R4.md => 462 - CUDA GEMM and GEMV for IQ4_KS_R4 and IQ5_KS_R4.md} (100%) rename github-data/pull_requests/{465-Set cache_prompt default to true.md => 465 - Set cache_prompt default to true.md} (100%) rename github-data/pull_requests/{468-Minor (~2%) iq2_ks TG performance improvement on CUDA.md => 468 - Minor _2_ iq2_ks TG performance improvement on CUDA.md} (100%) rename github-data/pull_requests/{469-Replace MLA-specific KV cache with the standard KV cache.md => 469 - Replace MLA-specific KV cache with the standard KV cache.md} (100%) rename github-data/pull_requests/{47-iq2_tn_ slightly better performance on AVX2.md => 47 - iq2_tn_ slightly better performance on AVX2.md} (100%) rename github-data/pull_requests/{470-Send [DONE] for OAI compatibility.md => 470 - Send _DONE_ for OAI compatibility.md} (100%) rename github-data/pull_requests/{471-NEON implementation for trellis quants.md => 471 - NEON implementation for trellis quants.md} (100%) rename github-data/pull_requests/{473-Replace MLA-specific KV cache with the standard KV cache V2.md => 473 - Replace MLA-specific KV cache with the standard KV cache V2.md} (100%) rename github-data/pull_requests/{475-Metal implementatio for the trellis quants..md => 475 - Metal implementatio for the trellis quants..md} (100%) rename github-data/pull_requests/{478-forgotten refs and typo.md => 478 - forgotten refs and typo.md} (100%) rename github-data/pull_requests/{48-AVX2 Flash Attention.md => 48 - AVX2 Flash Attention.md} (100%) rename github-data/pull_requests/{480-Rpc improvement.md => 480 - Rpc improvement.md} (100%) rename github-data/pull_requests/{481-Webui improvement.md => 481 - Webui improvement.md} (100%) rename github-data/pull_requests/{482-Trellis quants_ faster CPU prompt processing.md => 482 - Trellis quants_ faster CPU prompt processing.md} (100%) rename github-data/pull_requests/{483- convert_hf_to_gguf.py _ conversion from hf weights to Q6_0.md => 483 - convert_hf_to_gguf.py _ conversion from hf weights to Q6_0.md} (100%) rename github-data/pull_requests/{484-BF16 Trellis implementation.md => 484 - BF16 Trellis implementation.md} (100%) rename github-data/pull_requests/{486-Adding the XTC sampler.md => 486 - Adding the XTC sampler.md} (100%) rename github-data/pull_requests/{487-Make sure MMVQ is supported before using it.md => 487 - Make sure MMVQ is supported before using it.md} (100%) rename github-data/pull_requests/{488-Faster CPU prompt processing for Trellis quants and MoE models.md => 488 - Faster CPU prompt processing for Trellis quants and MoE models.md} (100%) rename github-data/pull_requests/{489-Adding top-n-sigma sampler.md => 489 - Adding top-n-sigma sampler.md} (100%) rename github-data/pull_requests/{49-ARM_NEON Flash Attention.md => 49 - ARM_NEON Flash Attention.md} (100%) rename github-data/pull_requests/{492-CUDA implementation for IQ1_S_R4.md => 492 - CUDA implementation for IQ1_S_R4.md} (100%) rename github-data/pull_requests/{493-MMQ implementation for IQ4_KS_R4 and IQ5_KS_R4.md => 493 - MMQ implementation for IQ4_KS_R4 and IQ5_KS_R4.md} (100%) rename github-data/pull_requests/{494-IQ1_M_R4 CUDA implementation.md => 494 - IQ1_M_R4 CUDA implementation.md} (100%) rename github-data/pull_requests/{495-Check if ffn_up and ffn_gate are of the same type before using fmoe.md => 495 - Check if ffn_up and ffn_gate are of the same type before using fmoe.md} (100%) rename github-data/pull_requests/{496-Quick hack_ add the MLA flag to llama_hparams.md => 496 - Quick hack_ add the MLA flag to llama_hparams.md} (100%) rename github-data/pull_requests/{497-Make prompt cache saving and restoring MLA aware.md => 497 - Make prompt cache saving and restoring MLA aware.md} (100%) rename github-data/pull_requests/{5-Fusing a mat mul op followed by a scale op on the CPU.md => 5 - Fusing a mat mul op followed by a scale op on the CPU.md} (100%) rename github-data/pull_requests/{50-AVX2 Flash Attention 2.md => 50 - AVX2 Flash Attention 2.md} (100%) rename github-data/pull_requests/{501-Fix #499.md => 501 - Fix _499.md} (100%) rename github-data/pull_requests/{502-Add an endpoint that lists all the saved prompt caches to server.md => 502 - Add an endpoint that lists all the saved prompt caches to server.md} (100%) rename github-data/pull_requests/{504-Add DRY and fix the server to use other new samplers..md => 504 - Add DRY and fix the server to use other new samplers..md} (100%) rename github-data/pull_requests/{505-New IQ4_KT trellis implementation.md => 505 - New IQ4_KT trellis implementation.md} (100%) rename github-data/pull_requests/{506-Fix non rpc build error.md => 506 - Fix non rpc build error.md} (100%) rename github-data/pull_requests/{508-Fix Compile error (C2668).md => 508 - Fix Compile error _C2668_.md} (100%) rename github-data/pull_requests/{509-Docs update.md => 509 - Docs update.md} (100%) rename github-data/pull_requests/{51-Quantized Flash Attention for all supported CPU platforms.md => 51 - Quantized Flash Attention for all supported CPU platforms.md} (100%) rename github-data/pull_requests/{510-Update News section of readme.md => 510 - Update News section of readme.md} (100%) rename github-data/pull_requests/{511-New IQ2_KT.md => 511 - New IQ2_KT.md} (100%) rename github-data/pull_requests/{512-Add top n sigma sampler in webui and other webui fix.md => 512 - Add top n sigma sampler in webui and other webui fix.md} (100%) rename github-data/pull_requests/{513-add dry sampler.md => 513 - add dry sampler.md} (100%) rename github-data/pull_requests/{515-IQ2_XXS_ much faster CPU prompt processing.md => 515 - IQ2_XXS_ much faster CPU prompt processing.md} (100%) rename github-data/pull_requests/{516-Much faster iq3_xxs GEMM via repacking to q8_0_r8 (AVX2).md => 516 - Much faster iq3_xxs GEMM via repacking to q8_0_r8 _AVX2_.md} (100%) rename github-data/pull_requests/{517-IQ1_S_ much faster CPU prompt processing.md => 517 - IQ1_S_ much faster CPU prompt processing.md} (100%) rename github-data/pull_requests/{518-IQ3_S_ much faster CPU prompt processing.md => 518 - IQ3_S_ much faster CPU prompt processing.md} (100%) rename github-data/pull_requests/{52-Fix bug and D _ 128 case for Q8_0 k-cache.md => 52 - Fix bug and D _ 128 case for Q8_0 k-cache.md} (100%) rename github-data/pull_requests/{520-Better strategy for GPU offload.md => 520 - Better strategy for GPU offload.md} (100%) rename github-data/pull_requests/{524-Perhaps a slightly better GEMV version for IQ2_XXS, IQ3_XXS, IQ3_S.md => 524 - Perhaps a slightly better GEMV version for IQ2_XXS_ IQ3_XXS_ IQ3_S.md} (100%) rename github-data/pull_requests/{525-Faster CPU prompt processing for Q4_K and Q5_K.md => 525 - Faster CPU prompt processing for Q4_K and Q5_K.md} (100%) rename github-data/pull_requests/{528-Fix bug introduced in #524_#525.md => 528 - Fix bug introduced in _524_525.md} (100%) rename github-data/pull_requests/{529-New IQ2_KT, IQ3_KT and IQ4_KT, V2.md => 529 - New IQ2_KT_ IQ3_KT and IQ4_KT_ V2.md} (100%) rename github-data/pull_requests/{53-Quantization mixes tweaks.md => 53 - Quantization mixes tweaks.md} (100%) rename github-data/pull_requests/{531-Much faster CPU prompt processing (part 1).md => 531 - Much faster CPU prompt processing _part 1_.md} (100%) rename github-data/pull_requests/{533-Much faster CPU prompt processing (part 2).md => 533 - Much faster CPU prompt processing _part 2_.md} (100%) rename github-data/pull_requests/{534-Much faster CPU prompt processing (part 3).md => 534 - Much faster CPU prompt processing _part 3_.md} (100%) rename github-data/pull_requests/{535-Minor readme update.md => 535 - Minor readme update.md} (100%) rename github-data/pull_requests/{536-Fix KT Neon _ ARM typo.md => 536 - Fix KT Neon _ ARM typo.md} (100%) rename github-data/pull_requests/{537-Update CMakeLists.txt to fix NDEBUG handling.md => 537 - Update CMakeLists.txt to fix NDEBUG handling.md} (100%) rename github-data/pull_requests/{54-Improve Q4_0 and Q8_0 performance on AVX2_Zen4.md => 54 - Improve Q4_0 and Q8_0 performance on AVX2_Zen4.md} (100%) rename github-data/pull_requests/{540-Fix missed block_q8_x2 bf16 -_ i16 change.md => 540 - Fix missed block_q8_x2 bf16 -_ i16 change.md} (100%) rename github-data/pull_requests/{541-Perhaps slightly faster trellis quants.md => 541 - Perhaps slightly faster trellis quants.md} (100%) rename github-data/pull_requests/{542-Fix NEON build.md => 542 - Fix NEON build.md} (100%) rename github-data/pull_requests/{544-New integer trellis on ARM_NEON .md => 544 - New integer trellis on ARM_NEON.md} (100%) rename github-data/pull_requests/{546-Faster ARM_NEON GEMM implementation for legacy quants.md => 546 - Faster ARM_NEON GEMM implementation for legacy quants.md} (100%) rename github-data/pull_requests/{547-build_ add script to simplify build&test workflow for Android.md => 547 - build_ add script to simplify build_test workflow for Android.md} (100%) rename github-data/pull_requests/{549-Much faster prompt processing for IQK quants (ARM_NEON).md => 549 - Much faster prompt processing for IQK quants _ARM_NEON_.md} (100%) rename github-data/pull_requests/{55-Improve Q5_0 performance on AVX2.md => 55 - Improve Q5_0 performance on AVX2.md} (100%) rename github-data/pull_requests/{550-Much faster prompt processing for I-quants (ARM_NEON).md => 550 - Much faster prompt processing for I-quants _ARM_NEON_.md} (100%) rename github-data/pull_requests/{552-Much faster prompt processing for k-quants (ARM_NEON).md => 552 - Much faster prompt processing for k-quants _ARM_NEON_.md} (100%) rename github-data/pull_requests/{553-Much faster prompt processing for IQ1_S and IQ1_M on ARM_NEON.md => 553 - Much faster prompt processing for IQ1_S and IQ1_M on ARM_NEON.md} (100%) rename github-data/pull_requests/{554-Update README.md to add quickstart section.md => 554 - Update README.md to add quickstart section.md} (100%) rename github-data/pull_requests/{555-Add Falcon-Edge support.md => 555 - Add Falcon-Edge support.md} (100%) rename github-data/pull_requests/{557-CUDA_ MMQ for iqX_r4 quants .md => 557 - CUDA_ MMQ for iqX_r4 quants.md} (100%) rename github-data/pull_requests/{558-Add mikupad to ik_llama as an alternative WebUI.md => 558 - Add mikupad to ik_llama as an alternative WebUI.md} (100%) rename github-data/pull_requests/{559-Use cuBLAS for large batches and quants with block size 16.md => 559 - Use cuBLAS for large batches and quants with block size 16.md} (100%) rename github-data/pull_requests/{56-BF16 support on Metal.md => 56 - BF16 support on Metal.md} (100%) rename github-data/pull_requests/{560-Remove what appears to be unnecessary asserts in ggml_cuda_cpy.md => 560 - Remove what appears to be unnecessary asserts in ggml_cuda_cpy.md} (100%) rename github-data/pull_requests/{563-Merge vulkan code from mainline up to commit of 6_28_2025.md => 563 - Merge vulkan code from mainline up to commit of 6_28_2025.md} (100%) rename github-data/pull_requests/{565-add hunyuan moe support for 561.md => 565 - add hunyuan moe support for 561.md} (100%) rename github-data/pull_requests/{566-Adding IQ3_KS quants.md => 566 - Adding IQ3_KS quants.md} (100%) rename github-data/pull_requests/{567-Minor CUDA PP speed improvement.md => 567 - Minor CUDA PP speed improvement.md} (100%) rename github-data/pull_requests/{569-Conditionally disable fused ops when building with Vulkan enabled.md => 569 - Conditionally disable fused ops when building with Vulkan enabled.md} (100%) rename github-data/pull_requests/{57-AVX2_Zen4 horizontal sums .md => 57 - AVX2_Zen4 horizontal sums.md} (100%) rename github-data/pull_requests/{570-Remove duplicate_misplaced cmake find_package for Vulkan.md => 570 - Remove duplicate_misplaced cmake find_package for Vulkan.md} (100%) rename github-data/pull_requests/{571-Fix CMakeLists.md => 571 - Fix CMakeLists.md} (100%) rename github-data/pull_requests/{573-Support for dots.llm1 models.md => 573 - Support for dots.llm1 models.md} (100%) rename github-data/pull_requests/{574-Change KQ mask padding to 64.md => 574 - Change KQ mask padding to 64.md} (100%) rename github-data/pull_requests/{577-Vulkan_ fused rms norm.md => 577 - Vulkan_ fused rms norm.md} (100%) rename github-data/pull_requests/{578-Do not crash when there is no DRY sampler.md => 578 - Do not crash when there is no DRY sampler.md} (100%) rename github-data/pull_requests/{579-Fix debug build failure with RPC off.md => 579 - Fix debug build failure with RPC off.md} (100%) rename github-data/pull_requests/{58-Fix compiler warnings.md => 58 - Fix compiler warnings.md} (100%) rename github-data/pull_requests/{580-Vulkan_ add GGML_OP_FUSED_MUL_UNARY.md => 580 - Vulkan_ add GGML_OP_FUSED_MUL_UNARY.md} (100%) rename github-data/pull_requests/{581-Vulkan_ Disable multi-add for now.md => 581 - Vulkan_ Disable multi-add for now.md} (100%) rename github-data/pull_requests/{582-Vulkan_ adding GGML_OP_MULTI_ADD implementation.md => 582 - Vulkan_ adding GGML_OP_MULTI_ADD implementation.md} (100%) rename github-data/pull_requests/{583-Adding forgotten file.md => 583 - Adding forgotten file.md} (100%) rename github-data/pull_requests/{584-Vulkan_ flash attention for DeepSeek models.md => 584 - Vulkan_ flash attention for DeepSeek models.md} (100%) rename github-data/pull_requests/{585-Special handling of Seed Coder FIM tokens.md => 585 - Special handling of Seed Coder FIM tokens.md} (100%) rename github-data/pull_requests/{587-Fix crash when there is no DRY sampler.md => 587 - Fix crash when there is no DRY sampler.md} (100%) rename github-data/pull_requests/{588-Fix server crash when there is no DRY sampler.md => 588 - Fix server crash when there is no DRY sampler.md} (100%) rename github-data/pull_requests/{589-CUDA_ small PP performance improvement for MoE models.md => 589 - CUDA_ small PP performance improvement for MoE models.md} (100%) rename github-data/pull_requests/{592-Another minor readme update.md => 592 - Another minor readme update.md} (100%) rename github-data/pull_requests/{593-Faster prompt processing for IQ2_KS, IQ2_K, IQ2_K_R4.md => 593 - Faster prompt processing for IQ2_KS_ IQ2_K_ IQ2_K_R4.md} (100%) rename github-data/pull_requests/{595-CUDA_ Faster prompt processing for several quantization types .md => 595 - CUDA_ Faster prompt processing for several quantization types.md} (100%) rename github-data/pull_requests/{598-Vulkan_ iquants and flash attention split_k_reduce improvement.md => 598 - Vulkan_ iquants and flash attention split_k_reduce improvement.md} (100%) rename github-data/pull_requests/{6-IQ4_K_ SOTA 4-bit quantization.md => 6 - IQ4_K_ SOTA 4-bit quantization.md} (100%) rename github-data/pull_requests/{602-Adding IQ2_KL.md => 602 - Adding IQ2_KL.md} (100%) rename github-data/pull_requests/{603-Check if MMQ should be used before using it.md => 603 - Check if MMQ should be used before using it.md} (100%) rename github-data/pull_requests/{604-Fix attn_v conditionality when quantizing..md => 604 - Fix attn_v conditionality when quantizing..md} (100%) rename github-data/pull_requests/{606-Add iq3_ks to constants.py.md => 606 - Add iq3_ks to constants.py.md} (100%) rename github-data/pull_requests/{607-vulkan_ support softmax_FA batch and broadcast.md => 607 - vulkan_ support softmax_FA batch and broadcast.md} (100%) rename github-data/pull_requests/{608-Vulkan_ a fresh start.md => 608 - Vulkan_ a fresh start.md} (100%) rename github-data/pull_requests/{609-Added kimi-k2 support (ported from llama.cpp).md => 609 - Added kimi-k2 support _ported from llama.cpp_.md} (100%) rename github-data/pull_requests/{61-Adding ability to have meta data per tensor row.md => 61 - Adding ability to have meta data per tensor row.md} (100%) rename github-data/pull_requests/{610-q8_k_r8_ experimental AVX512 version.md => 610 - q8_k_r8_ experimental AVX512 version.md} (100%) rename github-data/pull_requests/{611-Bump GGML_MAX_CONTEXTS to allow loading more shards.md => 611 - Bump GGML_MAX_CONTEXTS to allow loading more shards.md} (100%) rename github-data/pull_requests/{612-kimi-k2 convert script and chat template.md => 612 - kimi-k2 convert script and chat template.md} (100%) rename github-data/pull_requests/{616-Adding IQ1_KT - 1.75 bpw SOTA quants.md => 616 - Adding IQ1_KT - 1.75 bpw SOTA quants.md} (100%) rename github-data/pull_requests/{617-Fixup kimi-k2 convert indentation.md => 617 - Fixup kimi-k2 convert indentation.md} (100%) rename github-data/pull_requests/{618-Webui_ New Features for Conversations, Settings, and Chat Messages.md => 618 - Webui_ New Features for Conversations_ Settings_ and Chat Messages.md} (100%) rename github-data/pull_requests/{62-Use fp32 for K_Q in Metal FA implementation.md => 62 - Use fp32 for K_Q in Metal FA implementation.md} (100%) rename github-data/pull_requests/{620-Bump Windows max open files from 512 to 2048.md => 620 - Bump Windows max open files from 512 to 2048.md} (100%) rename github-data/pull_requests/{622-Add GGML_MAX_CONTEXTS definition in CMakeLists.txt.md => 622 - Add GGML_MAX_CONTEXTS definition in CMakeLists.txt.md} (100%) rename github-data/pull_requests/{624-Quantization tweaks.md => 624 - Quantization tweaks.md} (100%) rename github-data/pull_requests/{628-[Draft] Function calling support for Kimi-K2.md => 628 - _Draft_ Function calling support for Kimi-K2.md} (100%) rename github-data/pull_requests/{630-GEMM for IQ1_M.md => 630 - GEMM for IQ1_M.md} (100%) rename github-data/pull_requests/{64-Better sub-3-bit quantization mixes with a qkv tensor.md => 64 - Better sub-3-bit quantization mixes with a qkv tensor.md} (100%) rename github-data/pull_requests/{65-Adding SWIGLU unary op.md => 65 - Adding SWIGLU unary op.md} (100%) rename github-data/pull_requests/{66-CUDA non-contiguous RoPE.md => 66 - CUDA non-contiguous RoPE.md} (100%) rename github-data/pull_requests/{68-It is time to fix replace_all.md => 68 - It is time to fix replace_all.md} (100%) rename github-data/pull_requests/{69-Allow bf16 kv-cache.md => 69 - Allow bf16 kv-cache.md} (100%) rename github-data/pull_requests/{7-Adding IQ2_K, IQ3_K and IQ5_K.md => 7 - Adding IQ2_K_ IQ3_K and IQ5_K.md} (100%) rename github-data/pull_requests/{70-Fused unary(x)_y.md => 70 - Fused unary_x_y.md} (100%) rename github-data/pull_requests/{71-iqk_mul_mat_ better srategy when nrc_y not divisible by ny.md => 71 - iqk_mul_mat_ better srategy when nrc_y not divisible by ny.md} (100%) rename github-data/pull_requests/{72-iqk_mul_mat_ better iq4_nl implementation on Zen4_AVX2.md => 72 - iqk_mul_mat_ better iq4_nl implementation on Zen4_AVX2.md} (100%) rename github-data/pull_requests/{73-CUDA_ faster float -_ iq4_nl conversion.md => 73 - CUDA_ faster float -_ iq4_nl conversion.md} (100%) rename github-data/pull_requests/{74-IQ4_NL kv-cache on the CPU (Zen4_AVX2_ARM_NEON).md => 74 - IQ4_NL kv-cache on the CPU _Zen4_AVX2_ARM_NEON_.md} (100%) rename github-data/pull_requests/{75-Fix Q5_0 flash attention.md => 75 - Fix Q5_0 flash attention.md} (100%) rename github-data/pull_requests/{76-iq4_nl_ faster quantization.md => 76 - iq4_nl_ faster quantization.md} (100%) rename github-data/pull_requests/{77-Adding Q6_0.md => 77 - Adding Q6_0.md} (100%) rename github-data/pull_requests/{78-q6_0_ Slightly faster Zen4_AVX2.md => 78 - q6_0_ Slightly faster Zen4_AVX2.md} (100%) rename github-data/pull_requests/{79-Do not quantize activations if not necessary.md => 79 - Do not quantize activations if not necessary.md} (100%) rename github-data/pull_requests/{80-Move to c++17 projectwide.md => 80 - Move to c_17 projectwide.md} (100%) rename github-data/pull_requests/{81-Cleanup scale fudge factors.md => 81 - Cleanup scale fudge factors.md} (100%) rename github-data/pull_requests/{83-New SOTA quantization_ 4.25 bpw IQ4_KS.md => 83 - New SOTA quantization_ 4.25 bpw IQ4_KS.md} (100%) rename github-data/pull_requests/{84-Better model info.md => 84 - Better model info.md} (100%) rename github-data/pull_requests/{85-IQ2_KS_ 2.1875 bpw non-linear quantization.md => 85 - IQ2_KS_ 2.1875 bpw non-linear quantization.md} (100%) rename github-data/pull_requests/{86-Fix and optimize iq2k Metal implementation.md => 86 - Fix and optimize iq2k Metal implementation.md} (100%) rename github-data/pull_requests/{87-iq3_k_ fix and optimize Metal dot product.md => 87 - iq3_k_ fix and optimize Metal dot product.md} (100%) rename github-data/pull_requests/{89-Adding IQ4_KSS_ 4.0 bpw quants.md => 89 - Adding IQ4_KSS_ 4.0 bpw quants.md} (100%) rename github-data/pull_requests/{9-Fused soft cap and SIMD-ified GeLU .md => 9 - Fused soft cap and SIMD-ified GeLU.md} (100%) rename github-data/pull_requests/{90-iq4_ks_ faster dot product on Metal.md => 90 - iq4_ks_ faster dot product on Metal.md} (100%) rename github-data/pull_requests/{91-CLI - Specify GGML_TYPE to quantize for the main tensors..md => 91 - CLI - Specify GGML_TYPE to quantize for the main tensors..md} (100%) rename github-data/pull_requests/{93-Attempt to blindly fix Windows build failure.md => 93 - Attempt to blindly fix Windows build failure.md} (100%) rename github-data/pull_requests/{94-Adding @agray3's graph caching approach.md => 94 - Adding _agray3_s graph caching approach.md} (100%) rename github-data/pull_requests/{96-Quant strategies_ attn_q Q4 & attn_v Q6 for Llama 3.1 Q5_K_S.md => 96 - Quant strategies_ attn_q Q4 _ attn_v Q6 for Llama 3.1 Q5_K_S.md} (100%) rename github-data/pull_requests/{97-Bitnet_ make the scale tensors optional.md => 97 - Bitnet_ make the scale tensors optional.md} (100%) rename github-data/pull_requests/{98-Avoid rebuild of GGML graph for each token.md => 98 - Avoid rebuild of GGML graph for each token.md} (100%) rename github-data/pull_requests/{99-Enable IQ4_NL for KV-cache in token generation using Flash Attention .md => 99 - Enable IQ4_NL for KV-cache in token generation using Flash Attention.md} (100%) diff --git a/github-data/discussions/100-New argument _ env variable for GGML_SCHED_MAX_COPIES_.md b/github-data/discussions/100 - New argument _ env variable for GGML_SCHED_MAX_COPIES_.md similarity index 100% rename from github-data/discussions/100-New argument _ env variable for GGML_SCHED_MAX_COPIES_.md rename to github-data/discussions/100 - New argument _ env variable for GGML_SCHED_MAX_COPIES_.md diff --git a/github-data/discussions/104-Convenience improvements for llama-quantize.md b/github-data/discussions/104 - Convenience improvements for llama-quantize.md similarity index 100% rename from github-data/discussions/104-Convenience improvements for llama-quantize.md rename to github-data/discussions/104 - Convenience improvements for llama-quantize.md diff --git a/github-data/discussions/140-Questions about weight[j].md b/github-data/discussions/140 - Questions about weight_j_.md similarity index 100% rename from github-data/discussions/140-Questions about weight[j].md rename to github-data/discussions/140 - Questions about weight_j_.md diff --git a/github-data/discussions/15-Will LQER improve k- and i-quants_.md b/github-data/discussions/15 - Will LQER improve k- and i-quants_.md similarity index 100% rename from github-data/discussions/15-Will LQER improve k- and i-quants_.md rename to github-data/discussions/15 - Will LQER improve k- and i-quants_.md diff --git a/github-data/discussions/164-Latest CPU performance comparison with llama.cpp.md b/github-data/discussions/164 - Latest CPU performance comparison with llama.cpp.md similarity index 100% rename from github-data/discussions/164-Latest CPU performance comparison with llama.cpp.md rename to github-data/discussions/164 - Latest CPU performance comparison with llama.cpp.md diff --git a/github-data/discussions/165-Norm RMS Epsilon.md b/github-data/discussions/165 - Norm RMS Epsilon.md similarity index 100% rename from github-data/discussions/165-Norm RMS Epsilon.md rename to github-data/discussions/165 - Norm RMS Epsilon.md diff --git a/github-data/discussions/166-Learning more LLM quantization.md b/github-data/discussions/166 - Learning more LLM quantization.md similarity index 100% rename from github-data/discussions/166-Learning more LLM quantization.md rename to github-data/discussions/166 - Learning more LLM quantization.md diff --git a/github-data/discussions/18-CPU beating GPU in token generation speed.md b/github-data/discussions/18 - CPU beating GPU in token generation speed.md similarity index 100% rename from github-data/discussions/18-CPU beating GPU in token generation speed.md rename to github-data/discussions/18 - CPU beating GPU in token generation speed.md diff --git a/github-data/discussions/201-What is the NUMA situation _.md b/github-data/discussions/201 - What is the NUMA situation _.md similarity index 100% rename from github-data/discussions/201-What is the NUMA situation _.md rename to github-data/discussions/201 - What is the NUMA situation _.md diff --git a/github-data/discussions/211-help me create an importance matrix primer.md b/github-data/discussions/211 - help me create an importance matrix primer.md similarity index 100% rename from github-data/discussions/211-help me create an importance matrix primer.md rename to github-data/discussions/211 - help me create an importance matrix primer.md diff --git a/github-data/discussions/223-Recent performance testing with DeepSeek R1.md b/github-data/discussions/223 - Recent performance testing with DeepSeek R1.md similarity index 100% rename from github-data/discussions/223-Recent performance testing with DeepSeek R1.md rename to github-data/discussions/223 - Recent performance testing with DeepSeek R1.md diff --git a/github-data/discussions/242-Switching from llama.cpp_ktransformers, seeking advice_guidance.md b/github-data/discussions/242 - Switching from llama.cpp_ktransformers_ seeking advice_guidance.md similarity index 100% rename from github-data/discussions/242-Switching from llama.cpp_ktransformers, seeking advice_guidance.md rename to github-data/discussions/242 - Switching from llama.cpp_ktransformers_ seeking advice_guidance.md diff --git a/github-data/discussions/25-CPU prompt processing speed for large contexts.md b/github-data/discussions/25 - CPU prompt processing speed for large contexts.md similarity index 100% rename from github-data/discussions/25-CPU prompt processing speed for large contexts.md rename to github-data/discussions/25 - CPU prompt processing speed for large contexts.md diff --git a/github-data/discussions/256-Diverging from llama.cpp.md b/github-data/discussions/256 - Diverging from llama.cpp.md similarity index 100% rename from github-data/discussions/256-Diverging from llama.cpp.md rename to github-data/discussions/256 - Diverging from llama.cpp.md diff --git a/github-data/discussions/258-Quick-start Guide coming over from llama.cpp and ktransformers!.md b/github-data/discussions/258 - Quick-start Guide coming over from llama.cpp and ktransformers_.md similarity index 100% rename from github-data/discussions/258-Quick-start Guide coming over from llama.cpp and ktransformers!.md rename to github-data/discussions/258 - Quick-start Guide coming over from llama.cpp and ktransformers_.md diff --git a/github-data/discussions/266-Benchmarking DeepSeek R1 - 16x3090.md b/github-data/discussions/266 - Benchmarking DeepSeek R1 - 16x3090.md similarity index 100% rename from github-data/discussions/266-Benchmarking DeepSeek R1 - 16x3090.md rename to github-data/discussions/266 - Benchmarking DeepSeek R1 - 16x3090.md diff --git a/github-data/discussions/286-Testing `deepseek-ai_DeepSeek-V3-0324` model support..md b/github-data/discussions/286 - Testing _deepseek-ai_DeepSeek-V3-0324_ model support..md similarity index 100% rename from github-data/discussions/286-Testing `deepseek-ai_DeepSeek-V3-0324` model support..md rename to github-data/discussions/286 - Testing _deepseek-ai_DeepSeek-V3-0324_ model support..md diff --git a/github-data/discussions/288-On @compilade's PR 12557 and @jukofyork's quantization ideas.md b/github-data/discussions/288 - On _compilade_s PR 12557 and _jukofyork_s quantization ideas.md similarity index 100% rename from github-data/discussions/288-On @compilade's PR 12557 and @jukofyork's quantization ideas.md rename to github-data/discussions/288 - On _compilade_s PR 12557 and _jukofyork_s quantization ideas.md diff --git a/github-data/discussions/316-Mainline is now copying stuff from ik_llama.cpp.md b/github-data/discussions/316 - Mainline is now copying stuff from ik_llama.cpp.md similarity index 100% rename from github-data/discussions/316-Mainline is now copying stuff from ik_llama.cpp.md rename to github-data/discussions/316 - Mainline is now copying stuff from ik_llama.cpp.md diff --git a/github-data/discussions/319-KTransformers copying ik_llama.cpp.md b/github-data/discussions/319 - KTransformers copying ik_llama.cpp.md similarity index 100% rename from github-data/discussions/319-KTransformers copying ik_llama.cpp.md rename to github-data/discussions/319 - KTransformers copying ik_llama.cpp.md diff --git a/github-data/discussions/323-Is there an easy way to repack an existing GGUF so it could be used without --run-time-repack (thus enabling mmap).md b/github-data/discussions/323 - Is there an easy way to repack an existing GGUF so it could be used wit.md similarity index 100% rename from github-data/discussions/323-Is there an easy way to repack an existing GGUF so it could be used without --run-time-repack (thus enabling mmap).md rename to github-data/discussions/323 - Is there an easy way to repack an existing GGUF so it could be used wit.md diff --git a/github-data/discussions/334-`iq4_ks` performs great on gemma-3-27b-it-qat-q4_0-unquantized.md b/github-data/discussions/334 - _iq4_ks_ performs great on gemma-3-27b-it-qat-q4_0-unquantized.md similarity index 100% rename from github-data/discussions/334-`iq4_ks` performs great on gemma-3-27b-it-qat-q4_0-unquantized.md rename to github-data/discussions/334 - _iq4_ks_ performs great on gemma-3-27b-it-qat-q4_0-unquantized.md diff --git a/github-data/discussions/350-Maverick slow prompt with gpu.md b/github-data/discussions/350 - Maverick slow prompt with gpu.md similarity index 100% rename from github-data/discussions/350-Maverick slow prompt with gpu.md rename to github-data/discussions/350 - Maverick slow prompt with gpu.md diff --git a/github-data/discussions/354-Not all MLAs are born equal.md b/github-data/discussions/354 - Not all MLAs are born equal.md similarity index 100% rename from github-data/discussions/354-Not all MLAs are born equal.md rename to github-data/discussions/354 - Not all MLAs are born equal.md diff --git a/github-data/discussions/357-Qwen3 - early performance comparisons.md b/github-data/discussions/357 - Qwen3 - early performance comparisons.md similarity index 100% rename from github-data/discussions/357-Qwen3 - early performance comparisons.md rename to github-data/discussions/357 - Qwen3 - early performance comparisons.md diff --git a/github-data/discussions/359-Qwen3 quantization experiments.md b/github-data/discussions/359 - Qwen3 quantization experiments.md similarity index 100% rename from github-data/discussions/359-Qwen3 quantization experiments.md rename to github-data/discussions/359 - Qwen3 quantization experiments.md diff --git a/github-data/discussions/372-multy gpu.md b/github-data/discussions/372 - multy gpu.md similarity index 100% rename from github-data/discussions/372-multy gpu.md rename to github-data/discussions/372 - multy gpu.md diff --git a/github-data/discussions/384-ik_llama.cpp issues on an old workstation.md b/github-data/discussions/384 - ik_llama.cpp issues on an old workstation.md similarity index 100% rename from github-data/discussions/384-ik_llama.cpp issues on an old workstation.md rename to github-data/discussions/384 - ik_llama.cpp issues on an old workstation.md diff --git a/github-data/discussions/385-Qwen3 235B performance on Intel Xeon Scalable processor.md b/github-data/discussions/385 - Qwen3 235B performance on Intel Xeon Scalable processor.md similarity index 100% rename from github-data/discussions/385-Qwen3 235B performance on Intel Xeon Scalable processor.md rename to github-data/discussions/385 - Qwen3 235B performance on Intel Xeon Scalable processor.md diff --git a/github-data/discussions/393-Creating quantized models.md b/github-data/discussions/393 - Creating quantized models.md similarity index 100% rename from github-data/discussions/393-Creating quantized models.md rename to github-data/discussions/393 - Creating quantized models.md diff --git a/github-data/discussions/395-Why does imatrix not tokenize special tokens_.md b/github-data/discussions/395 - Why does imatrix not tokenize special tokens_.md similarity index 100% rename from github-data/discussions/395-Why does imatrix not tokenize special tokens_.md rename to github-data/discussions/395 - Why does imatrix not tokenize special tokens_.md diff --git a/github-data/discussions/396-Best settings for Maverick - Dual CPU Xeon 8480+ - RTX 3090.md b/github-data/discussions/396 - Best settings for Maverick - Dual CPU Xeon 8480_ - RTX 3090.md similarity index 100% rename from github-data/discussions/396-Best settings for Maverick - Dual CPU Xeon 8480+ - RTX 3090.md rename to github-data/discussions/396 - Best settings for Maverick - Dual CPU Xeon 8480_ - RTX 3090.md diff --git a/github-data/discussions/397-KV split while using `-sm row`.md b/github-data/discussions/397 - KV split while using _-sm row_.md similarity index 100% rename from github-data/discussions/397-KV split while using `-sm row`.md rename to github-data/discussions/397 - KV split while using _-sm row_.md diff --git a/github-data/discussions/399-Qwen 30b.A3b IK_LCPP comparisons on lowspec machine.md b/github-data/discussions/399 - Qwen 30b.A3b IK_LCPP comparisons on lowspec machine.md similarity index 100% rename from github-data/discussions/399-Qwen 30b.A3b IK_LCPP comparisons on lowspec machine.md rename to github-data/discussions/399 - Qwen 30b.A3b IK_LCPP comparisons on lowspec machine.md diff --git a/github-data/discussions/401-install bitnet (or other cpu models) on a fresh termux aarch64.md b/github-data/discussions/401 - install bitnet _or other cpu models_ on a fresh termux aarch64.md similarity index 100% rename from github-data/discussions/401-install bitnet (or other cpu models) on a fresh termux aarch64.md rename to github-data/discussions/401 - install bitnet _or other cpu models_ on a fresh termux aarch64.md diff --git a/github-data/discussions/403-Tool Calling and Structured Response (Json Mode) support.md b/github-data/discussions/403 - Tool Calling and Structured Response _Json Mode_ support.md similarity index 100% rename from github-data/discussions/403-Tool Calling and Structured Response (Json Mode) support.md rename to github-data/discussions/403 - Tool Calling and Structured Response _Json Mode_ support.md diff --git a/github-data/discussions/434-Quant Cookers Basic Guide.md b/github-data/discussions/434 - Quant Cookers Basic Guide.md similarity index 100% rename from github-data/discussions/434-Quant Cookers Basic Guide.md rename to github-data/discussions/434 - Quant Cookers Basic Guide.md diff --git a/github-data/discussions/451-Context reuse _ context shift for long prompts.md b/github-data/discussions/451 - Context reuse _ context shift for long prompts.md similarity index 100% rename from github-data/discussions/451-Context reuse _ context shift for long prompts.md rename to github-data/discussions/451 - Context reuse _ context shift for long prompts.md diff --git a/github-data/discussions/459-qwen3 metrics on ancient hardware (2x xeon Vs 2x P100).md b/github-data/discussions/459 - qwen3 metrics on ancient hardware _2x xeon Vs 2x P100_.md similarity index 100% rename from github-data/discussions/459-qwen3 metrics on ancient hardware (2x xeon Vs 2x P100).md rename to github-data/discussions/459 - qwen3 metrics on ancient hardware _2x xeon Vs 2x P100_.md diff --git a/github-data/discussions/466-A curiosity..md b/github-data/discussions/466 - A curiosity..md similarity index 100% rename from github-data/discussions/466-A curiosity..md rename to github-data/discussions/466 - A curiosity..md diff --git a/github-data/discussions/477-DeepSeek-R1-0528 ik quants!.md b/github-data/discussions/477 - DeepSeek-R1-0528 ik quants_.md similarity index 100% rename from github-data/discussions/477-DeepSeek-R1-0528 ik quants!.md rename to github-data/discussions/477 - DeepSeek-R1-0528 ik quants_.md diff --git a/github-data/discussions/491--rtr actually hurts prompt t_s for large ubatch_.md b/github-data/discussions/491 - -rtr actually hurts prompt t_s for large ubatch_.md similarity index 100% rename from github-data/discussions/491--rtr actually hurts prompt t_s for large ubatch_.md rename to github-data/discussions/491 - -rtr actually hurts prompt t_s for large ubatch_.md diff --git a/github-data/discussions/519-Android Build.md b/github-data/discussions/519 - Android Build.md similarity index 100% rename from github-data/discussions/519-Android Build.md rename to github-data/discussions/519 - Android Build.md diff --git a/github-data/discussions/526-Partial requant feature to save compute and time during tests..md b/github-data/discussions/526 - Partial requant feature to save compute and time during tests..md similarity index 100% rename from github-data/discussions/526-Partial requant feature to save compute and time during tests..md rename to github-data/discussions/526 - Partial requant feature to save compute and time during tests..md diff --git a/github-data/discussions/532-Guidance on GPU Layer Offloading Strategy in ik_llama.cpp for Multi GPU Rig (2x5090 + 2x4090).md b/github-data/discussions/532 - Guidance on GPU Layer Offloading Strategy in ik_llama.cpp for Multi GPU.md similarity index 100% rename from github-data/discussions/532-Guidance on GPU Layer Offloading Strategy in ik_llama.cpp for Multi GPU Rig (2x5090 + 2x4090).md rename to github-data/discussions/532 - Guidance on GPU Layer Offloading Strategy in ik_llama.cpp for Multi GPU.md diff --git a/github-data/discussions/543-dots.llm1 support and thanks.md b/github-data/discussions/543 - dots.llm1 support and thanks.md similarity index 100% rename from github-data/discussions/543-dots.llm1 support and thanks.md rename to github-data/discussions/543 - dots.llm1 support and thanks.md diff --git a/github-data/discussions/545-Vulkan support_.md b/github-data/discussions/545 - Vulkan support_.md similarity index 100% rename from github-data/discussions/545-Vulkan support_.md rename to github-data/discussions/545 - Vulkan support_.md diff --git a/github-data/discussions/548-Poor performance with bf16 model on Qwen3 30B-A3B.md b/github-data/discussions/548 - Poor performance with bf16 model on Qwen3 30B-A3B.md similarity index 100% rename from github-data/discussions/548-Poor performance with bf16 model on Qwen3 30B-A3B.md rename to github-data/discussions/548 - Poor performance with bf16 model on Qwen3 30B-A3B.md diff --git a/github-data/discussions/556-ik_llama.cpp for Armv8.0.md b/github-data/discussions/556 - ik_llama.cpp for Armv8.0.md similarity index 100% rename from github-data/discussions/556-ik_llama.cpp for Armv8.0.md rename to github-data/discussions/556 - ik_llama.cpp for Armv8.0.md diff --git a/github-data/discussions/562-AMD GPU Vulkan & ROCm_HIP Discussion.md b/github-data/discussions/562 - AMD GPU Vulkan _ ROCm_HIP Discussion.md similarity index 100% rename from github-data/discussions/562-AMD GPU Vulkan & ROCm_HIP Discussion.md rename to github-data/discussions/562 - AMD GPU Vulkan _ ROCm_HIP Discussion.md diff --git a/github-data/discussions/564-Maybe an interesting CUDA PR here..md b/github-data/discussions/564 - Maybe an interesting CUDA PR here..md similarity index 100% rename from github-data/discussions/564-Maybe an interesting CUDA PR here..md rename to github-data/discussions/564 - Maybe an interesting CUDA PR here..md diff --git a/github-data/discussions/586-Slow KV cache rm operation.md b/github-data/discussions/586 - Slow KV cache rm operation.md similarity index 100% rename from github-data/discussions/586-Slow KV cache rm operation.md rename to github-data/discussions/586 - Slow KV cache rm operation.md diff --git a/github-data/discussions/590-How important is Vulkan back-end development_.md b/github-data/discussions/590 - How important is Vulkan back-end development_.md similarity index 100% rename from github-data/discussions/590-How important is Vulkan back-end development_.md rename to github-data/discussions/590 - How important is Vulkan back-end development_.md diff --git a/github-data/discussions/591-I dont see any speed improvement in generation, so want to understand if i am missing something.md b/github-data/discussions/591 - I dont see any speed improvement in generation_ so want to understand i.md similarity index 100% rename from github-data/discussions/591-I dont see any speed improvement in generation, so want to understand if i am missing something.md rename to github-data/discussions/591 - I dont see any speed improvement in generation_ so want to understand i.md diff --git a/github-data/discussions/594-Is AVX2 a hard requirement on x64_.md b/github-data/discussions/594 - Is AVX2 a hard requirement on x64_.md similarity index 100% rename from github-data/discussions/594-Is AVX2 a hard requirement on x64_.md rename to github-data/discussions/594 - Is AVX2 a hard requirement on x64_.md diff --git a/github-data/discussions/599-mla matrix absorbtion.md b/github-data/discussions/599 - mla matrix absorbtion.md similarity index 100% rename from github-data/discussions/599-mla matrix absorbtion.md rename to github-data/discussions/599 - mla matrix absorbtion.md diff --git a/github-data/discussions/613-Pathological Quant_CUDA combinations -- How to know what works_.md b/github-data/discussions/613 - Pathological Quant_CUDA combinations -- How to know what works_.md similarity index 100% rename from github-data/discussions/613-Pathological Quant_CUDA combinations -- How to know what works_.md rename to github-data/discussions/613 - Pathological Quant_CUDA combinations -- How to know what works_.md diff --git a/github-data/discussions/619-gpu p2p utilization.md b/github-data/discussions/619 - gpu p2p utilization.md similarity index 100% rename from github-data/discussions/619-gpu p2p utilization.md rename to github-data/discussions/619 - gpu p2p utilization.md diff --git a/github-data/discussions/621-Deepseek v3_r1 poisoned prompt_.md b/github-data/discussions/621 - Deepseek v3_r1 poisoned prompt_.md similarity index 100% rename from github-data/discussions/621-Deepseek v3_r1 poisoned prompt_.md rename to github-data/discussions/621 - Deepseek v3_r1 poisoned prompt_.md diff --git a/github-data/discussions/623-Quantizing panels_bundles instead of blocks_.md b/github-data/discussions/623 - Quantizing panels_bundles instead of blocks_.md similarity index 100% rename from github-data/discussions/623-Quantizing panels_bundles instead of blocks_.md rename to github-data/discussions/623 - Quantizing panels_bundles instead of blocks_.md diff --git a/github-data/discussions/63-LLaMA-3.2 quantization evaluation.md b/github-data/discussions/63 - LLaMA-3.2 quantization evaluation.md similarity index 100% rename from github-data/discussions/63-LLaMA-3.2 quantization evaluation.md rename to github-data/discussions/63 - LLaMA-3.2 quantization evaluation.md diff --git a/github-data/discussions/8-New quantization types IQ2_K, IQ3_K, IQ4_K, IQ5_K.md b/github-data/discussions/8 - New quantization types IQ2_K_ IQ3_K_ IQ4_K_ IQ5_K.md similarity index 100% rename from github-data/discussions/8-New quantization types IQ2_K, IQ3_K, IQ4_K, IQ5_K.md rename to github-data/discussions/8 - New quantization types IQ2_K_ IQ3_K_ IQ4_K_ IQ5_K.md diff --git a/github-data/discussions/82-4bpw GGML TYPE_.md b/github-data/discussions/82 - 4bpw GGML TYPE_.md similarity index 100% rename from github-data/discussions/82-4bpw GGML TYPE_.md rename to github-data/discussions/82 - 4bpw GGML TYPE_.md diff --git a/github-data/discussions/95-Bitnet.md b/github-data/discussions/95 - Bitnet.md similarity index 100% rename from github-data/discussions/95-Bitnet.md rename to github-data/discussions/95 - Bitnet.md diff --git a/github-data/issues/103-Bug_ K cache without FA.md b/github-data/issues/103 - Bug_ K cache without FA.md similarity index 100% rename from github-data/issues/103-Bug_ K cache without FA.md rename to github-data/issues/103 - Bug_ K cache without FA.md diff --git a/github-data/issues/133-Refactor_ update ggml library_.md b/github-data/issues/133 - Refactor_ update ggml library_.md similarity index 100% rename from github-data/issues/133-Refactor_ update ggml library_.md rename to github-data/issues/133 - Refactor_ update ggml library_.md diff --git a/github-data/issues/159-Feature Request_ steps how to compile as cmake i struction on the origi al repo not work here..md b/github-data/issues/159 - Feature Request_ steps how to compile as cmake i struction on the origi.md similarity index 100% rename from github-data/issues/159-Feature Request_ steps how to compile as cmake i struction on the origi al repo not work here..md rename to github-data/issues/159 - Feature Request_ steps how to compile as cmake i struction on the origi.md diff --git a/github-data/issues/160-Bug_ Can't compile on MSVC 2022.md b/github-data/issues/160 - Bug_ Can_t compile on MSVC 2022.md similarity index 100% rename from github-data/issues/160-Bug_ Can't compile on MSVC 2022.md rename to github-data/issues/160 - Bug_ Can_t compile on MSVC 2022.md diff --git a/github-data/issues/167-Bug_ Unable to quantize Falcon 10B 1.58 bitnet model.md b/github-data/issues/167 - Bug_ Unable to quantize Falcon 10B 1.58 bitnet model.md similarity index 100% rename from github-data/issues/167-Bug_ Unable to quantize Falcon 10B 1.58 bitnet model.md rename to github-data/issues/167 - Bug_ Unable to quantize Falcon 10B 1.58 bitnet model.md diff --git a/github-data/issues/183-Refactor_ iqk_mul_mat.md b/github-data/issues/183 - Refactor_ iqk_mul_mat.md similarity index 100% rename from github-data/issues/183-Refactor_ iqk_mul_mat.md rename to github-data/issues/183 - Refactor_ iqk_mul_mat.md diff --git a/github-data/issues/196-Refactor_ remove usage of Q8_1 for activation quantization.md b/github-data/issues/196 - Refactor_ remove usage of Q8_1 for activation quantization.md similarity index 100% rename from github-data/issues/196-Refactor_ remove usage of Q8_1 for activation quantization.md rename to github-data/issues/196 - Refactor_ remove usage of Q8_1 for activation quantization.md diff --git a/github-data/issues/199-Bug_ Changing system_prompt on llama-server at runtime breaks parallel processing.md b/github-data/issues/199 - Bug_ Changing system_prompt on llama-server at runtime breaks parallel .md similarity index 100% rename from github-data/issues/199-Bug_ Changing system_prompt on llama-server at runtime breaks parallel processing.md rename to github-data/issues/199 - Bug_ Changing system_prompt on llama-server at runtime breaks parallel .md diff --git a/github-data/issues/203-Bug_ Compliation Error for Intel(R) Xeon(R) Gold 6326 CPU.md b/github-data/issues/203 - Bug_ Compliation Error for Intel_R_ Xeon_R_ Gold 6326 CPU.md similarity index 100% rename from github-data/issues/203-Bug_ Compliation Error for Intel(R) Xeon(R) Gold 6326 CPU.md rename to github-data/issues/203 - Bug_ Compliation Error for Intel_R_ Xeon_R_ Gold 6326 CPU.md diff --git a/github-data/issues/209-Does the iqk_mul_mat.cpp support 1.58-bit quantization model_.md b/github-data/issues/209 - Does the iqk_mul_mat.cpp support 1.58-bit quantization model_.md similarity index 100% rename from github-data/issues/209-Does the iqk_mul_mat.cpp support 1.58-bit quantization model_.md rename to github-data/issues/209 - Does the iqk_mul_mat.cpp support 1.58-bit quantization model_.md diff --git a/github-data/issues/214-AVX512 build error.md b/github-data/issues/214 - AVX512 build error.md similarity index 100% rename from github-data/issues/214-AVX512 build error.md rename to github-data/issues/214 - AVX512 build error.md diff --git a/github-data/issues/217-Bug_ CPU FA with fp16 K-cache is broken.md b/github-data/issues/217 - Bug_ CPU FA with fp16 K-cache is broken.md similarity index 100% rename from github-data/issues/217-Bug_ CPU FA with fp16 K-cache is broken.md rename to github-data/issues/217 - Bug_ CPU FA with fp16 K-cache is broken.md diff --git a/github-data/issues/224-Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md b/github-data/issues/224 - Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md similarity index 100% rename from github-data/issues/224-Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md rename to github-data/issues/224 - Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md diff --git a/github-data/issues/227-Prevent FA usage on CUDA when K and V head sizes are different.md b/github-data/issues/227 - Prevent FA usage on CUDA when K and V head sizes are different.md similarity index 100% rename from github-data/issues/227-Prevent FA usage on CUDA when K and V head sizes are different.md rename to github-data/issues/227 - Prevent FA usage on CUDA when K and V head sizes are different.md diff --git a/github-data/issues/228-Feature Request_ create tool to offline repack models.md b/github-data/issues/228 - Feature Request_ create tool to offline repack models.md similarity index 100% rename from github-data/issues/228-Feature Request_ create tool to offline repack models.md rename to github-data/issues/228 - Feature Request_ create tool to offline repack models.md diff --git a/github-data/issues/230-Weird assert when using online repacking.md b/github-data/issues/230 - Weird assert when using online repacking.md similarity index 100% rename from github-data/issues/230-Weird assert when using online repacking.md rename to github-data/issues/230 - Weird assert when using online repacking.md diff --git a/github-data/issues/245-Bug_ Perplexity returns NaN with IQ4_KSS quantisation.md b/github-data/issues/245 - Bug_ Perplexity returns NaN with IQ4_KSS quantisation.md similarity index 100% rename from github-data/issues/245-Bug_ Perplexity returns NaN with IQ4_KSS quantisation.md rename to github-data/issues/245 - Bug_ Perplexity returns NaN with IQ4_KSS quantisation.md diff --git a/github-data/issues/249-CUDA_ results for MoE models are not reproducible.md b/github-data/issues/249 - CUDA_ results for MoE models are not reproducible.md similarity index 100% rename from github-data/issues/249-CUDA_ results for MoE models are not reproducible.md rename to github-data/issues/249 - CUDA_ results for MoE models are not reproducible.md diff --git a/github-data/issues/254-Split-mode row.md b/github-data/issues/254 - Split-mode row.md similarity index 100% rename from github-data/issues/254-Split-mode row.md rename to github-data/issues/254 - Split-mode row.md diff --git a/github-data/issues/255-Feature Request_ dynamic layer by layer offloading during prompt processing for VRAM constrained scenarios.md b/github-data/issues/255 - Feature Request_ dynamic layer by layer offloading during prompt proces.md similarity index 100% rename from github-data/issues/255-Feature Request_ dynamic layer by layer offloading during prompt processing for VRAM constrained scenarios.md rename to github-data/issues/255 - Feature Request_ dynamic layer by layer offloading during prompt proces.md diff --git a/github-data/issues/257-Bug_ mla=2 in llama-server will crash when request done.md b/github-data/issues/257 - Bug_ mla_2 in llama-server will crash when request done.md similarity index 100% rename from github-data/issues/257-Bug_ mla=2 in llama-server will crash when request done.md rename to github-data/issues/257 - Bug_ mla_2 in llama-server will crash when request done.md diff --git a/github-data/issues/26-Feature Request_ Improve CPU processing speed for large contexts.md b/github-data/issues/26 - Feature Request_ Improve CPU processing speed for large contexts.md similarity index 100% rename from github-data/issues/26-Feature Request_ Improve CPU processing speed for large contexts.md rename to github-data/issues/26 - Feature Request_ Improve CPU processing speed for large contexts.md diff --git a/github-data/issues/263-Benchmarking DeepSeek R1 - 16x3090.md b/github-data/issues/263 - Benchmarking DeepSeek R1 - 16x3090.md similarity index 100% rename from github-data/issues/263-Benchmarking DeepSeek R1 - 16x3090.md rename to github-data/issues/263 - Benchmarking DeepSeek R1 - 16x3090.md diff --git a/github-data/issues/267-Feature Request_ HugePage mmap alloc for DeepSeek V3_R1.md b/github-data/issues/267 - Feature Request_ HugePage mmap alloc for DeepSeek V3_R1.md similarity index 100% rename from github-data/issues/267-Feature Request_ HugePage mmap alloc for DeepSeek V3_R1.md rename to github-data/issues/267 - Feature Request_ HugePage mmap alloc for DeepSeek V3_R1.md diff --git a/github-data/issues/271-Possible regression computing `wk_b` tensors on the fly after PR #265.md b/github-data/issues/271 - Possible regression computing _wk_b_ tensors on the fly after PR _265.md similarity index 100% rename from github-data/issues/271-Possible regression computing `wk_b` tensors on the fly after PR #265.md rename to github-data/issues/271 - Possible regression computing _wk_b_ tensors on the fly after PR _265.md diff --git a/github-data/issues/281-Bug_ Strange dips in TG performance.md b/github-data/issues/281 - Bug_ Strange dips in TG performance.md similarity index 100% rename from github-data/issues/281-Bug_ Strange dips in TG performance.md rename to github-data/issues/281 - Bug_ Strange dips in TG performance.md diff --git a/github-data/issues/285-llama-perplexity giving all NaNs on unsloth Q8_0 quant.md b/github-data/issues/285 - llama-perplexity giving all NaNs on unsloth Q8_0 quant.md similarity index 100% rename from github-data/issues/285-llama-perplexity giving all NaNs on unsloth Q8_0 quant.md rename to github-data/issues/285 - llama-perplexity giving all NaNs on unsloth Q8_0 quant.md diff --git a/github-data/issues/29-Bug_ some ifdefs missing in ggml_src_iqk_iqk_quantize.cpp.md b/github-data/issues/29 - Bug_ some ifdefs missing in ggml_src_iqk_iqk_quantize.cpp.md similarity index 100% rename from github-data/issues/29-Bug_ some ifdefs missing in ggml_src_iqk_iqk_quantize.cpp.md rename to github-data/issues/29 - Bug_ some ifdefs missing in ggml_src_iqk_iqk_quantize.cpp.md diff --git a/github-data/issues/293-Feature Request_ IQ6_K row interleaved quant.md b/github-data/issues/293 - Feature Request_ IQ6_K row interleaved quant.md similarity index 100% rename from github-data/issues/293-Feature Request_ IQ6_K row interleaved quant.md rename to github-data/issues/293 - Feature Request_ IQ6_K row interleaved quant.md diff --git a/github-data/issues/296-Possible numerical stability issue with experimental quant of DeepSeek-V3-0324_.md b/github-data/issues/296 - Possible numerical stability issue with experimental quant of DeepSeek-.md similarity index 100% rename from github-data/issues/296-Possible numerical stability issue with experimental quant of DeepSeek-V3-0324_.md rename to github-data/issues/296 - Possible numerical stability issue with experimental quant of DeepSeek-.md diff --git a/github-data/issues/297-Update gguf-py scripts to support new quant types..md b/github-data/issues/297 - Update gguf-py scripts to support new quant types..md similarity index 100% rename from github-data/issues/297-Update gguf-py scripts to support new quant types..md rename to github-data/issues/297 - Update gguf-py scripts to support new quant types..md diff --git a/github-data/issues/30-Bug_ Appcrash on Windows 7 with GGML_USE_IQK_MULMAT.md b/github-data/issues/30 - Bug_ Appcrash on Windows 7 with GGML_USE_IQK_MULMAT.md similarity index 100% rename from github-data/issues/30-Bug_ Appcrash on Windows 7 with GGML_USE_IQK_MULMAT.md rename to github-data/issues/30 - Bug_ Appcrash on Windows 7 with GGML_USE_IQK_MULMAT.md diff --git a/github-data/issues/300-Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md b/github-data/issues/300 - Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md similarity index 100% rename from github-data/issues/300-Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md rename to github-data/issues/300 - Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md diff --git a/github-data/issues/305-Gibberish output when using DeepSeek-V3-0324-IQ2_K_R4 on mixed CPU + 4 GPUs with -mla (1 or 2).md b/github-data/issues/305 - Gibberish output when using DeepSeek-V3-0324-IQ2_K_R4 on mixed CPU _ 4 .md similarity index 100% rename from github-data/issues/305-Gibberish output when using DeepSeek-V3-0324-IQ2_K_R4 on mixed CPU + 4 GPUs with -mla (1 or 2).md rename to github-data/issues/305 - Gibberish output when using DeepSeek-V3-0324-IQ2_K_R4 on mixed CPU _ 4 .md diff --git a/github-data/issues/306-Confused by the -mla flag. What's supported_.md b/github-data/issues/306 - Confused by the -mla flag. What_s supported_.md similarity index 100% rename from github-data/issues/306-Confused by the -mla flag. What's supported_.md rename to github-data/issues/306 - Confused by the -mla flag. What_s supported_.md diff --git a/github-data/issues/308-Bug_ Compiling for arm64, error_ cannot convert ‘const uint32x4_t’ to ‘uint8x16_t’ and similar errors.md b/github-data/issues/308 - Bug_ Compiling for arm64_ error_ cannot convert _const uint32x4_t_ to _.md similarity index 100% rename from github-data/issues/308-Bug_ Compiling for arm64, error_ cannot convert ‘const uint32x4_t’ to ‘uint8x16_t’ and similar errors.md rename to github-data/issues/308 - Bug_ Compiling for arm64_ error_ cannot convert _const uint32x4_t_ to _.md diff --git a/github-data/issues/314-Llama 4 Support_.md b/github-data/issues/314 - Llama 4 Support_.md similarity index 100% rename from github-data/issues/314-Llama 4 Support_.md rename to github-data/issues/314 - Llama 4 Support_.md diff --git a/github-data/issues/322-Speculative decoding support.md b/github-data/issues/322 - Speculative decoding support.md similarity index 100% rename from github-data/issues/322-Speculative decoding support.md rename to github-data/issues/322 - Speculative decoding support.md diff --git a/github-data/issues/335-Bug_ Llama 4 generates garbage with longer context (64K+; the issue is not present in the llama.cpp).md b/github-data/issues/335 - Bug_ Llama 4 generates garbage with longer context _64K_ the issue is n.md similarity index 100% rename from github-data/issues/335-Bug_ Llama 4 generates garbage with longer context (64K+; the issue is not present in the llama.cpp).md rename to github-data/issues/335 - Bug_ Llama 4 generates garbage with longer context _64K_ the issue is n.md diff --git a/github-data/issues/339-Bug_ bitnet2b_2501 template issues.md b/github-data/issues/339 - Bug_ bitnet2b_2501 template issues.md similarity index 100% rename from github-data/issues/339-Bug_ bitnet2b_2501 template issues.md rename to github-data/issues/339 - Bug_ bitnet2b_2501 template issues.md diff --git a/github-data/issues/34-Bug_ FA fails when processing prompt lengths that are not a multiple of 8.md b/github-data/issues/34 - Bug_ FA fails when processing prompt lengths that are not a multiple of .md similarity index 100% rename from github-data/issues/34-Bug_ FA fails when processing prompt lengths that are not a multiple of 8.md rename to github-data/issues/34 - Bug_ FA fails when processing prompt lengths that are not a multiple of .md diff --git a/github-data/issues/340-Bug_ _unknown model architecture_ 'cohere2'_ when trying to load Command A model.md b/github-data/issues/340 - Bug_ _unknown model architecture_ _cohere2_ when trying to load Command.md similarity index 100% rename from github-data/issues/340-Bug_ _unknown model architecture_ 'cohere2'_ when trying to load Command A model.md rename to github-data/issues/340 - Bug_ _unknown model architecture_ _cohere2_ when trying to load Command.md diff --git a/github-data/issues/345-build question newbie.md b/github-data/issues/345 - build question newbie.md similarity index 100% rename from github-data/issues/345-build question newbie.md rename to github-data/issues/345 - build question newbie.md diff --git a/github-data/issues/353-Binaries releases for Windows _.md b/github-data/issues/353 - Binaries releases for Windows _.md similarity index 100% rename from github-data/issues/353-Binaries releases for Windows _.md rename to github-data/issues/353 - Binaries releases for Windows _.md diff --git a/github-data/issues/358-Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md b/github-data/issues/358 - Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md similarity index 100% rename from github-data/issues/358-Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md rename to github-data/issues/358 - Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md diff --git a/github-data/issues/361-Bug_ Build not detecting some supported ARM CPUs.md b/github-data/issues/361 - Bug_ Build not detecting some supported ARM CPUs.md similarity index 100% rename from github-data/issues/361-Bug_ Build not detecting some supported ARM CPUs.md rename to github-data/issues/361 - Bug_ Build not detecting some supported ARM CPUs.md diff --git a/github-data/issues/362-README language is vague wrt. _quantization improvements_.md b/github-data/issues/362 - README language is vague wrt. _quantization improvements_.md similarity index 100% rename from github-data/issues/362-README language is vague wrt. _quantization improvements_.md rename to github-data/issues/362 - README language is vague wrt. _quantization improvements_.md diff --git a/github-data/issues/363-Bug_ Gibberish output when using flash attention using Mistral-Small-Instruct-2409-Q6_K and Gemma-3-12b-it-q4_0 on CPU.md b/github-data/issues/363 - Bug_ Gibberish output when using flash attention using Mistral-Small-I.md similarity index 100% rename from github-data/issues/363-Bug_ Gibberish output when using flash attention using Mistral-Small-Instruct-2409-Q6_K and Gemma-3-12b-it-q4_0 on CPU.md rename to github-data/issues/363 - Bug_ Gibberish output when using flash attention using Mistral-Small-I.md diff --git a/github-data/issues/365-Bug_ Updated BitNet arch bitnet-b1.58.md b/github-data/issues/365 - Bug_ Updated BitNet arch bitnet-b1.58.md similarity index 100% rename from github-data/issues/365-Bug_ Updated BitNet arch bitnet-b1.58.md rename to github-data/issues/365 - Bug_ Updated BitNet arch bitnet-b1.58.md diff --git a/github-data/issues/367-Bug_ IQ1_S_R4, IQ1_M_R4 failed on Qwen3-235B-A22B.md b/github-data/issues/367 - Bug_ IQ1_S_R4_ IQ1_M_R4 failed on Qwen3-235B-A22B.md similarity index 100% rename from github-data/issues/367-Bug_ IQ1_S_R4, IQ1_M_R4 failed on Qwen3-235B-A22B.md rename to github-data/issues/367 - Bug_ IQ1_S_R4_ IQ1_M_R4 failed on Qwen3-235B-A22B.md diff --git a/github-data/issues/373-DeepSeekV3 0324 can't load newest UD quants (with MLA). Older quant works but with slower pre processing than gen speed (CPU + CUDA).md b/github-data/issues/373 - DeepSeekV3 0324 can_t load newest UD quants _with MLA_. Older quant wor.md similarity index 100% rename from github-data/issues/373-DeepSeekV3 0324 can't load newest UD quants (with MLA). Older quant works but with slower pre processing than gen speed (CPU + CUDA).md rename to github-data/issues/373 - DeepSeekV3 0324 can_t load newest UD quants _with MLA_. Older quant wor.md diff --git a/github-data/issues/376-Bug_ unknown model architecture_ 'deci' (when loading Llama-3_1-Nemotron-Ultra-253B).md b/github-data/issues/376 - Bug_ unknown model architecture_ _deci_ _when loading Llama-3_1-Nemotro.md similarity index 100% rename from github-data/issues/376-Bug_ unknown model architecture_ 'deci' (when loading Llama-3_1-Nemotron-Ultra-253B).md rename to github-data/issues/376 - Bug_ unknown model architecture_ _deci_ _when loading Llama-3_1-Nemotro.md diff --git a/github-data/issues/378-Feature Request_ Use ik_llama.cpp with llama-cpp-python.md b/github-data/issues/378 - Feature Request_ Use ik_llama.cpp with llama-cpp-python.md similarity index 100% rename from github-data/issues/378-Feature Request_ Use ik_llama.cpp with llama-cpp-python.md rename to github-data/issues/378 - Feature Request_ Use ik_llama.cpp with llama-cpp-python.md diff --git a/github-data/issues/379-Bug_ Cannot build on WoA.md b/github-data/issues/379 - Bug_ Cannot build on WoA.md similarity index 100% rename from github-data/issues/379-Bug_ Cannot build on WoA.md rename to github-data/issues/379 - Bug_ Cannot build on WoA.md diff --git a/github-data/issues/380-Drop at the start of generation.md b/github-data/issues/380 - Drop at the start of generation.md similarity index 100% rename from github-data/issues/380-Drop at the start of generation.md rename to github-data/issues/380 - Drop at the start of generation.md diff --git a/github-data/issues/381-ik_llama.cpp_ggml_src_ggml-cuda_fattn.cu_66_ fatal error after latest.md b/github-data/issues/381 - ik_llama.cpp_ggml_src_ggml-cuda_fattn.cu_66_ fatal error after latest.md similarity index 100% rename from github-data/issues/381-ik_llama.cpp_ggml_src_ggml-cuda_fattn.cu_66_ fatal error after latest.md rename to github-data/issues/381 - ik_llama.cpp_ggml_src_ggml-cuda_fattn.cu_66_ fatal error after latest.md diff --git a/github-data/issues/383-Bug_ Loading DeepSeek R1T Chimera causes _llama_model_load_ error loading model_ check_tensor_dims_ tensor 'blk.0.attn_q_b.weight' has wrong shape; expected 1536, 73728, got 1536, 24576, 1, b/github-data/issues/383 - Bug_ Loading DeepSeek R1T Chimera causes _llama_model_load_ error loadi.md similarity index 100% rename from github-data/issues/383-Bug_ Loading DeepSeek R1T Chimera causes _llama_model_load_ error loading model_ check_tensor_dims_ tensor 'blk.0.attn_q_b.weight' has wrong shape; expected 1536, 73728, got 1536, 24576, 1, rename to github-data/issues/383 - Bug_ Loading DeepSeek R1T Chimera causes _llama_model_load_ error loadi.md diff --git a/github-data/issues/387-Bug_ bitnet 1.58 on termux segmentation fault.md b/github-data/issues/387 - Bug_ bitnet 1.58 on termux segmentation fault.md similarity index 100% rename from github-data/issues/387-Bug_ bitnet 1.58 on termux segmentation fault.md rename to github-data/issues/387 - Bug_ bitnet 1.58 on termux segmentation fault.md diff --git a/github-data/issues/388-Bug_ Clash with mainline llama.cpp .so files.md b/github-data/issues/388 - Bug_ Clash with mainline llama.cpp .so files.md similarity index 100% rename from github-data/issues/388-Bug_ Clash with mainline llama.cpp .so files.md rename to github-data/issues/388 - Bug_ Clash with mainline llama.cpp .so files.md diff --git a/github-data/issues/389-Bug_ llama-batched-bench crashed with batch size _2.md b/github-data/issues/389 - Bug_ llama-batched-bench crashed with batch size _2.md similarity index 100% rename from github-data/issues/389-Bug_ llama-batched-bench crashed with batch size _2.md rename to github-data/issues/389 - Bug_ llama-batched-bench crashed with batch size _2.md diff --git a/github-data/issues/398-Bug_ -fmoe causing illegal memory access.md b/github-data/issues/398 - Bug_ -fmoe causing illegal memory access.md similarity index 100% rename from github-data/issues/398-Bug_ -fmoe causing illegal memory access.md rename to github-data/issues/398 - Bug_ -fmoe causing illegal memory access.md diff --git a/github-data/issues/407-Feature Request_ Support for function calling in llama-server.md b/github-data/issues/407 - Feature Request_ Support for function calling in llama-server.md similarity index 100% rename from github-data/issues/407-Feature Request_ Support for function calling in llama-server.md rename to github-data/issues/407 - Feature Request_ Support for function calling in llama-server.md diff --git a/github-data/issues/412-Bug_ Static asserts trip during compile..md b/github-data/issues/412 - Bug_ Static asserts trip during compile..md similarity index 100% rename from github-data/issues/412-Bug_ Static asserts trip during compile..md rename to github-data/issues/412 - Bug_ Static asserts trip during compile..md diff --git a/github-data/issues/419-qwen3 metrics in expert parallel(2x P100).md b/github-data/issues/419 - qwen3 metrics in expert parallel_2x P100_.md similarity index 100% rename from github-data/issues/419-qwen3 metrics in expert parallel(2x P100).md rename to github-data/issues/419 - qwen3 metrics in expert parallel_2x P100_.md diff --git a/github-data/issues/420-Bug_ standard attention is broken.md b/github-data/issues/420 - Bug_ standard attention is broken.md similarity index 100% rename from github-data/issues/420-Bug_ standard attention is broken.md rename to github-data/issues/420 - Bug_ standard attention is broken.md diff --git a/github-data/issues/423-Bug_ Compile failure undefined reference to `void mul_mat_q_case.md b/github-data/issues/423 - Bug_ Compile failure undefined reference to _void mul_mat_q_case.md similarity index 100% rename from github-data/issues/423-Bug_ Compile failure undefined reference to `void mul_mat_q_case.md rename to github-data/issues/423 - Bug_ Compile failure undefined reference to _void mul_mat_q_case.md diff --git a/github-data/issues/425-Bug_ CUDA error_ an illegal memory access was encountered.md b/github-data/issues/425 - Bug_ CUDA error_ an illegal memory access was encountered.md similarity index 100% rename from github-data/issues/425-Bug_ CUDA error_ an illegal memory access was encountered.md rename to github-data/issues/425 - Bug_ CUDA error_ an illegal memory access was encountered.md diff --git a/github-data/issues/432-Refactor_ GGUF v14 broke compatibility with IQx_KS quants.md b/github-data/issues/432 - Refactor_ GGUF v14 broke compatibility with IQx_KS quants.md similarity index 100% rename from github-data/issues/432-Refactor_ GGUF v14 broke compatibility with IQx_KS quants.md rename to github-data/issues/432 - Refactor_ GGUF v14 broke compatibility with IQx_KS quants.md diff --git a/github-data/issues/433-Feature Request_ CORS support.md b/github-data/issues/433 - Feature Request_ CORS support.md similarity index 100% rename from github-data/issues/433-Feature Request_ CORS support.md rename to github-data/issues/433 - Feature Request_ CORS support.md diff --git a/github-data/issues/436-Bug_ Saving the prompt cache causes Segfault.md b/github-data/issues/436 - Bug_ Saving the prompt cache causes Segfault.md similarity index 100% rename from github-data/issues/436-Bug_ Saving the prompt cache causes Segfault.md rename to github-data/issues/436 - Bug_ Saving the prompt cache causes Segfault.md diff --git a/github-data/issues/437-Feature Request_ support intel amx for further accelerate.md b/github-data/issues/437 - Feature Request_ support intel amx for further accelerate.md similarity index 100% rename from github-data/issues/437-Feature Request_ support intel amx for further accelerate.md rename to github-data/issues/437 - Feature Request_ support intel amx for further accelerate.md diff --git a/github-data/issues/440-Feature Request_ Top n-sigma sampler.md b/github-data/issues/440 - Feature Request_ Top n-sigma sampler.md similarity index 100% rename from github-data/issues/440-Feature Request_ Top n-sigma sampler.md rename to github-data/issues/440 - Feature Request_ Top n-sigma sampler.md diff --git a/github-data/issues/447-Compilation Error_ Error C2676.md b/github-data/issues/447 - Compilation Error_ Error C2676.md similarity index 100% rename from github-data/issues/447-Compilation Error_ Error C2676.md rename to github-data/issues/447 - Compilation Error_ Error C2676.md diff --git a/github-data/issues/450-Bug_ Performance regression.md b/github-data/issues/450 - Bug_ Performance regression.md similarity index 100% rename from github-data/issues/450-Bug_ Performance regression.md rename to github-data/issues/450 - Bug_ Performance regression.md diff --git a/github-data/issues/452-Falcon H1 Support.md b/github-data/issues/452 - Falcon H1 Support.md similarity index 100% rename from github-data/issues/452-Falcon H1 Support.md rename to github-data/issues/452 - Falcon H1 Support.md diff --git a/github-data/issues/455-Bug_ KV cache is never reused in OpenAI compatible Chat Completion api.md b/github-data/issues/455 - Bug_ KV cache is never reused in OpenAI compatible Chat Completion api.md similarity index 100% rename from github-data/issues/455-Bug_ KV cache is never reused in OpenAI compatible Chat Completion api.md rename to github-data/issues/455 - Bug_ KV cache is never reused in OpenAI compatible Chat Completion api.md diff --git a/github-data/issues/456-Bug_ no compilation without IQK_MULMAT.md b/github-data/issues/456 - Bug_ no compilation without IQK_MULMAT.md similarity index 100% rename from github-data/issues/456-Bug_ no compilation without IQK_MULMAT.md rename to github-data/issues/456 - Bug_ no compilation without IQK_MULMAT.md diff --git a/github-data/issues/463-Research_ V100 Flash Attention Implementation.md b/github-data/issues/463 - Research_ V100 Flash Attention Implementation.md similarity index 100% rename from github-data/issues/463-Research_ V100 Flash Attention Implementation.md rename to github-data/issues/463 - Research_ V100 Flash Attention Implementation.md diff --git a/github-data/issues/464-Bug_ The streaming every couple of rows blocks for 5-8s.md b/github-data/issues/464 - Bug_ The streaming every couple of rows blocks for 5-8s.md similarity index 100% rename from github-data/issues/464-Bug_ The streaming every couple of rows blocks for 5-8s.md rename to github-data/issues/464 - Bug_ The streaming every couple of rows blocks for 5-8s.md diff --git a/github-data/issues/467-Bug_ Server does not send data_ [DONE] for OpenAI-compatible streaming endpoint `_v1_chat_completions`.md b/github-data/issues/467 - Bug_ Server does not send data_ _DONE_ for OpenAI-compatible streaming .md similarity index 100% rename from github-data/issues/467-Bug_ Server does not send data_ [DONE] for OpenAI-compatible streaming endpoint `_v1_chat_completions`.md rename to github-data/issues/467 - Bug_ Server does not send data_ _DONE_ for OpenAI-compatible streaming .md diff --git a/github-data/issues/472-Bug_ Don't build ggml-aarch64 regardless of CPU arch type.md b/github-data/issues/472 - Bug_ Don_t build ggml-aarch64 regardless of CPU arch type.md similarity index 100% rename from github-data/issues/472-Bug_ Don't build ggml-aarch64 regardless of CPU arch type.md rename to github-data/issues/472 - Bug_ Don_t build ggml-aarch64 regardless of CPU arch type.md diff --git a/github-data/issues/474-Bug_ Perf Regression in PP throughput after Pull #461 (...R4 CUDA impl).md b/github-data/issues/474 - Bug_ Perf Regression in PP throughput after Pull _461 _...R4 CUDA impl_.md similarity index 100% rename from github-data/issues/474-Bug_ Perf Regression in PP throughput after Pull #461 (...R4 CUDA impl).md rename to github-data/issues/474 - Bug_ Perf Regression in PP throughput after Pull _461 _...R4 CUDA impl_.md diff --git a/github-data/issues/476-Research_ performance divergence.md b/github-data/issues/476 - Research_ performance divergence.md similarity index 100% rename from github-data/issues/476-Research_ performance divergence.md rename to github-data/issues/476 - Research_ performance divergence.md diff --git a/github-data/issues/479-Bug_ _ggml_backend_cuda_graph_compute_ disabling CUDA graphs due to GPU architecture_ flood.md b/github-data/issues/479 - Bug_ _ggml_backend_cuda_graph_compute_ disabling CUDA graphs due to GPU.md similarity index 100% rename from github-data/issues/479-Bug_ _ggml_backend_cuda_graph_compute_ disabling CUDA graphs due to GPU architecture_ flood.md rename to github-data/issues/479 - Bug_ _ggml_backend_cuda_graph_compute_ disabling CUDA graphs due to GPU.md diff --git a/github-data/issues/485-Bug_ Illegal Memory Access loading model to CUDA1.md b/github-data/issues/485 - Bug_ Illegal Memory Access loading model to CUDA1.md similarity index 100% rename from github-data/issues/485-Bug_ Illegal Memory Access loading model to CUDA1.md rename to github-data/issues/485 - Bug_ Illegal Memory Access loading model to CUDA1.md diff --git a/github-data/issues/490-Bug_ Performance drop with 14292913 #461.md b/github-data/issues/490 - Bug_ Performance drop with 14292913 _461.md similarity index 100% rename from github-data/issues/490-Bug_ Performance drop with 14292913 #461.md rename to github-data/issues/490 - Bug_ Performance drop with 14292913 _461.md diff --git a/github-data/issues/498-question_ about quantize method.md b/github-data/issues/498 - question_ about quantize method.md similarity index 100% rename from github-data/issues/498-question_ about quantize method.md rename to github-data/issues/498 - question_ about quantize method.md diff --git a/github-data/issues/499-Bug_ cache quantization crash with IQK_FORCE_BF16.md b/github-data/issues/499 - Bug_ cache quantization crash with IQK_FORCE_BF16.md similarity index 100% rename from github-data/issues/499-Bug_ cache quantization crash with IQK_FORCE_BF16.md rename to github-data/issues/499 - Bug_ cache quantization crash with IQK_FORCE_BF16.md diff --git a/github-data/issues/500-Bug_ Insane cudaMalloc OOM Error on Dual 3090 GPUs.md b/github-data/issues/500 - Bug_ Insane cudaMalloc OOM Error on Dual 3090 GPUs.md similarity index 100% rename from github-data/issues/500-Bug_ Insane cudaMalloc OOM Error on Dual 3090 GPUs.md rename to github-data/issues/500 - Bug_ Insane cudaMalloc OOM Error on Dual 3090 GPUs.md diff --git a/github-data/issues/503-Bug_ server_cli fails with segmentation fault.md b/github-data/issues/503 - Bug_ server_cli fails with segmentation fault.md similarity index 100% rename from github-data/issues/503-Bug_ server_cli fails with segmentation fault.md rename to github-data/issues/503 - Bug_ server_cli fails with segmentation fault.md diff --git a/github-data/issues/507-Compatible gguf models _.md b/github-data/issues/507 - Compatible gguf models _.md similarity index 100% rename from github-data/issues/507-Compatible gguf models _.md rename to github-data/issues/507 - Compatible gguf models _.md diff --git a/github-data/issues/514-CUDA Kernel Error on RTX 5090 (Compute Capability 12.0)_ _no kernel image is available for execution on the device_.md b/github-data/issues/514 - CUDA Kernel Error on RTX 5090 _Compute Capability 12.0_ _no kernel imag.md similarity index 100% rename from github-data/issues/514-CUDA Kernel Error on RTX 5090 (Compute Capability 12.0)_ _no kernel image is available for execution on the device_.md rename to github-data/issues/514 - CUDA Kernel Error on RTX 5090 _Compute Capability 12.0_ _no kernel imag.md diff --git a/github-data/issues/521-When offloading semi layers to some GPUs with -ot, TG t_s performance tanks (CUDA + CPU, DeepSeek V3-R1), while not on main llamacpp..md b/github-data/issues/521 - When offloading semi layers to some GPUs with -ot_ TG t_s performance t.md similarity index 100% rename from github-data/issues/521-When offloading semi layers to some GPUs with -ot, TG t_s performance tanks (CUDA + CPU, DeepSeek V3-R1), while not on main llamacpp..md rename to github-data/issues/521 - When offloading semi layers to some GPUs with -ot_ TG t_s performance t.md diff --git a/github-data/issues/522-Bug_ disabling CUDA graphs due to mul_mat_id.md b/github-data/issues/522 - Bug_ disabling CUDA graphs due to mul_mat_id.md similarity index 100% rename from github-data/issues/522-Bug_ disabling CUDA graphs due to mul_mat_id.md rename to github-data/issues/522 - Bug_ disabling CUDA graphs due to mul_mat_id.md diff --git a/github-data/issues/523-Bug_ tg speed drop after https___github.com_ikawrakow_ik_llama.cpp_pull_518.md b/github-data/issues/523 - Bug_ tg speed drop after https_github.com_ikawrakow_ik_llama.cpp_pull_5.md similarity index 100% rename from github-data/issues/523-Bug_ tg speed drop after https___github.com_ikawrakow_ik_llama.cpp_pull_518.md rename to github-data/issues/523 - Bug_ tg speed drop after https_github.com_ikawrakow_ik_llama.cpp_pull_5.md diff --git a/github-data/issues/527-Bug_ Webui improvement #481 core dump with a certain question..md b/github-data/issues/527 - Bug_ Webui improvement _481 core dump with a certain question..md similarity index 100% rename from github-data/issues/527-Bug_ Webui improvement #481 core dump with a certain question..md rename to github-data/issues/527 - Bug_ Webui improvement _481 core dump with a certain question..md diff --git a/github-data/issues/530-Getting crash on second prompt..md b/github-data/issues/530 - Getting crash on second prompt..md similarity index 100% rename from github-data/issues/530-Getting crash on second prompt..md rename to github-data/issues/530 - Getting crash on second prompt..md diff --git a/github-data/issues/538-Bug_ GGML_ASSERT failed at first prompt.md b/github-data/issues/538 - Bug_ GGML_ASSERT failed at first prompt.md similarity index 100% rename from github-data/issues/538-Bug_ GGML_ASSERT failed at first prompt.md rename to github-data/issues/538 - Bug_ GGML_ASSERT failed at first prompt.md diff --git a/github-data/issues/539-Bug_ garbage output.md b/github-data/issues/539 - Bug_ garbage output.md similarity index 100% rename from github-data/issues/539-Bug_ garbage output.md rename to github-data/issues/539 - Bug_ garbage output.md diff --git a/github-data/issues/551-Feature Request_ Support for Falcon Edge series.md b/github-data/issues/551 - Feature Request_ Support for Falcon Edge series.md similarity index 100% rename from github-data/issues/551-Feature Request_ Support for Falcon Edge series.md rename to github-data/issues/551 - Feature Request_ Support for Falcon Edge series.md diff --git a/github-data/issues/561-Feature Request_ Tencent Hunyuan-A13B model support.md b/github-data/issues/561 - Feature Request_ Tencent Hunyuan-A13B model support.md similarity index 100% rename from github-data/issues/561-Feature Request_ Tencent Hunyuan-A13B model support.md rename to github-data/issues/561 - Feature Request_ Tencent Hunyuan-A13B model support.md diff --git a/github-data/issues/568-Feature Request_ ERNIE MoE Model Support.md b/github-data/issues/568 - Feature Request_ ERNIE MoE Model Support.md similarity index 100% rename from github-data/issues/568-Feature Request_ ERNIE MoE Model Support.md rename to github-data/issues/568 - Feature Request_ ERNIE MoE Model Support.md diff --git a/github-data/issues/572-Bug_ Oops(ggml_compute_forward_sum_rows_f32, ffn_moe_weights_sum-60)_ found nan, on DeepSeek V3_R1 on CUDA + CPU.md b/github-data/issues/572 - Bug_ Oops_ggml_compute_forward_sum_rows_f32_ ffn_moe_weights_sum-60_ fo.md similarity index 100% rename from github-data/issues/572-Bug_ Oops(ggml_compute_forward_sum_rows_f32, ffn_moe_weights_sum-60)_ found nan, on DeepSeek V3_R1 on CUDA + CPU.md rename to github-data/issues/572 - Bug_ Oops_ggml_compute_forward_sum_rows_f32_ ffn_moe_weights_sum-60_ fo.md diff --git a/github-data/issues/575-Bug_ llama-server crash with sampling order.md b/github-data/issues/575 - Bug_ llama-server crash with sampling order.md similarity index 100% rename from github-data/issues/575-Bug_ llama-server crash with sampling order.md rename to github-data/issues/575 - Bug_ llama-server crash with sampling order.md diff --git a/github-data/issues/576-Bug_ llama-server crash with _Deepseek2 does not support K-shift_.md b/github-data/issues/576 - Bug_ llama-server crash with _Deepseek2 does not support K-shift_.md similarity index 100% rename from github-data/issues/576-Bug_ llama-server crash with _Deepseek2 does not support K-shift_.md rename to github-data/issues/576 - Bug_ llama-server crash with _Deepseek2 does not support K-shift_.md diff --git a/github-data/issues/59-Bug_ GGML Compilation Error_ undefined references to `iqk_mul_mat'.md b/github-data/issues/59 - Bug_ GGML Compilation Error_ undefined references to _iqk_mul_mat_.md similarity index 100% rename from github-data/issues/59-Bug_ GGML Compilation Error_ undefined references to `iqk_mul_mat'.md rename to github-data/issues/59 - Bug_ GGML Compilation Error_ undefined references to _iqk_mul_mat_.md diff --git a/github-data/issues/596-Bug_ Lastest commit broke llama-cli on Windows - mmq.cuh_107_ fatal error.md b/github-data/issues/596 - Bug_ Lastest commit broke llama-cli on Windows - mmq.cuh_107_ fatal err.md similarity index 100% rename from github-data/issues/596-Bug_ Lastest commit broke llama-cli on Windows - mmq.cuh_107_ fatal error.md rename to github-data/issues/596 - Bug_ Lastest commit broke llama-cli on Windows - mmq.cuh_107_ fatal err.md diff --git a/github-data/issues/597-Feature Request_ Add THUDM_GLM-4-MoE-100B-A10B support.md b/github-data/issues/597 - Feature Request_ Add THUDM_GLM-4-MoE-100B-A10B support.md similarity index 100% rename from github-data/issues/597-Feature Request_ Add THUDM_GLM-4-MoE-100B-A10B support.md rename to github-data/issues/597 - Feature Request_ Add THUDM_GLM-4-MoE-100B-A10B support.md diff --git a/github-data/issues/60-Bug_ Illegal instruction on NEON and Q4_0_4_4.md b/github-data/issues/60 - Bug_ Illegal instruction on NEON and Q4_0_4_4.md similarity index 100% rename from github-data/issues/60-Bug_ Illegal instruction on NEON and Q4_0_4_4.md rename to github-data/issues/60 - Bug_ Illegal instruction on NEON and Q4_0_4_4.md diff --git a/github-data/issues/600-Feature Request_ Port --reasoning-budget from main llamacpp (llamaserver).md b/github-data/issues/600 - Feature Request_ Port --reasoning-budget from main llamacpp _llamaserve.md similarity index 100% rename from github-data/issues/600-Feature Request_ Port --reasoning-budget from main llamacpp (llamaserver).md rename to github-data/issues/600 - Feature Request_ Port --reasoning-budget from main llamacpp _llamaserve.md diff --git a/github-data/issues/601-Bug_ llama-imatrix crashing.md b/github-data/issues/601 - Bug_ llama-imatrix crashing.md similarity index 100% rename from github-data/issues/601-Bug_ llama-imatrix crashing.md rename to github-data/issues/601 - Bug_ llama-imatrix crashing.md diff --git a/github-data/issues/605-Bug_ IQ3_KS missing from GGMLQuantizationType - gguf_reader.py script cannot process IQ3_KS tensors.md b/github-data/issues/605 - Bug_ IQ3_KS missing from GGMLQuantizationType - gguf_reader.py script c.md similarity index 100% rename from github-data/issues/605-Bug_ IQ3_KS missing from GGMLQuantizationType - gguf_reader.py script cannot process IQ3_KS tensors.md rename to github-data/issues/605 - Bug_ IQ3_KS missing from GGMLQuantizationType - gguf_reader.py script c.md diff --git a/github-data/issues/614-Feature Request_ port no-mmproj-offload.md b/github-data/issues/614 - Feature Request_ port no-mmproj-offload.md similarity index 100% rename from github-data/issues/614-Feature Request_ port no-mmproj-offload.md rename to github-data/issues/614 - Feature Request_ port no-mmproj-offload.md diff --git a/github-data/issues/615-Bug_ Gemma3 Vision not working.md b/github-data/issues/615 - Bug_ Gemma3 Vision not working.md similarity index 100% rename from github-data/issues/615-Bug_ Gemma3 Vision not working.md rename to github-data/issues/615 - Bug_ Gemma3 Vision not working.md diff --git a/github-data/issues/625-Bug_ undefined symbol errors after successful compilation.md b/github-data/issues/625 - Bug_ undefined symbol errors after successful compilation.md similarity index 100% rename from github-data/issues/625-Bug_ undefined symbol errors after successful compilation.md rename to github-data/issues/625 - Bug_ undefined symbol errors after successful compilation.md diff --git a/github-data/issues/626-Feature Request_ Add IQK GEMM for IQ1_M.md b/github-data/issues/626 - Feature Request_ Add IQK GEMM for IQ1_M.md similarity index 100% rename from github-data/issues/626-Feature Request_ Add IQK GEMM for IQ1_M.md rename to github-data/issues/626 - Feature Request_ Add IQK GEMM for IQ1_M.md diff --git a/github-data/issues/627-Feature Request_ Tensor Parallelism.md b/github-data/issues/627 - Feature Request_ Tensor Parallelism.md similarity index 100% rename from github-data/issues/627-Feature Request_ Tensor Parallelism.md rename to github-data/issues/627 - Feature Request_ Tensor Parallelism.md diff --git a/github-data/issues/629-Multi-GPU performance (Windows) is significantly worse than single-GPU.md b/github-data/issues/629 - Multi-GPU performance _Windows_ is significantly worse than single-GPU.md similarity index 100% rename from github-data/issues/629-Multi-GPU performance (Windows) is significantly worse than single-GPU.md rename to github-data/issues/629 - Multi-GPU performance _Windows_ is significantly worse than single-GPU.md diff --git a/github-data/issues/67-Feature Request_ Elliminate_reduce unnecessary copies .md b/github-data/issues/67 - Feature Request_ Elliminate_reduce unnecessary copies.md similarity index 100% rename from github-data/issues/67-Feature Request_ Elliminate_reduce unnecessary copies .md rename to github-data/issues/67 - Feature Request_ Elliminate_reduce unnecessary copies.md diff --git a/github-data/issues/88-Bug_ Won't compile on MSVC.md b/github-data/issues/88 - Bug_ Won_t compile on MSVC.md similarity index 100% rename from github-data/issues/88-Bug_ Won't compile on MSVC.md rename to github-data/issues/88 - Bug_ Won_t compile on MSVC.md diff --git a/github-data/issues/92-Bug_ Quantized KV cache produces garbage in situation where llama.cpp does not.md b/github-data/issues/92 - Bug_ Quantized KV cache produces garbage in situation where llama.cpp do.md similarity index 100% rename from github-data/issues/92-Bug_ Quantized KV cache produces garbage in situation where llama.cpp does not.md rename to github-data/issues/92 - Bug_ Quantized KV cache produces garbage in situation where llama.cpp do.md diff --git a/github-data/pull_requests/1-Offload Bitnet token embeddings to the GPU.md b/github-data/pull_requests/1 - Offload Bitnet token embeddings to the GPU.md similarity index 100% rename from github-data/pull_requests/1-Offload Bitnet token embeddings to the GPU.md rename to github-data/pull_requests/1 - Offload Bitnet token embeddings to the GPU.md diff --git a/github-data/pull_requests/10-iq4_k_ speedup quantization by a factor of ~2.md b/github-data/pull_requests/10 - iq4_k_ speedup quantization by a factor of _2.md similarity index 100% rename from github-data/pull_requests/10-iq4_k_ speedup quantization by a factor of ~2.md rename to github-data/pull_requests/10 - iq4_k_ speedup quantization by a factor of _2.md diff --git a/github-data/pull_requests/101-Enable q6_0 in flash attention.md b/github-data/pull_requests/101 - Enable q6_0 in flash attention.md similarity index 100% rename from github-data/pull_requests/101-Enable q6_0 in flash attention.md rename to github-data/pull_requests/101 - Enable q6_0 in flash attention.md diff --git a/github-data/pull_requests/102-Add support for Granite and GraniteMoE models.md b/github-data/pull_requests/102 - Add support for Granite and GraniteMoE models.md similarity index 100% rename from github-data/pull_requests/102-Add support for Granite and GraniteMoE models.md rename to github-data/pull_requests/102 - Add support for Granite and GraniteMoE models.md diff --git a/github-data/pull_requests/105-Fix quantized k-cache without FA.md b/github-data/pull_requests/105 - Fix quantized k-cache without FA.md similarity index 100% rename from github-data/pull_requests/105-Fix quantized k-cache without FA.md rename to github-data/pull_requests/105 - Fix quantized k-cache without FA.md diff --git a/github-data/pull_requests/106-Bitnet changes.md b/github-data/pull_requests/106 - Bitnet changes.md similarity index 100% rename from github-data/pull_requests/106-Bitnet changes.md rename to github-data/pull_requests/106 - Bitnet changes.md diff --git a/github-data/pull_requests/107-Faster IQ1_BN Metal implementation.md b/github-data/pull_requests/107 - Faster IQ1_BN Metal implementation.md similarity index 100% rename from github-data/pull_requests/107-Faster IQ1_BN Metal implementation.md rename to github-data/pull_requests/107 - Faster IQ1_BN Metal implementation.md diff --git a/github-data/pull_requests/108-Another Bitnet performance improvement on Metal.md b/github-data/pull_requests/108 - Another Bitnet performance improvement on Metal.md similarity index 100% rename from github-data/pull_requests/108-Another Bitnet performance improvement on Metal.md rename to github-data/pull_requests/108 - Another Bitnet performance improvement on Metal.md diff --git a/github-data/pull_requests/109-Bitnet CUDA improvements.md b/github-data/pull_requests/109 - Bitnet CUDA improvements.md similarity index 100% rename from github-data/pull_requests/109-Bitnet CUDA improvements.md rename to github-data/pull_requests/109 - Bitnet CUDA improvements.md diff --git a/github-data/pull_requests/11-Faster iq3_k and iq5_k quantization.md b/github-data/pull_requests/11 - Faster iq3_k and iq5_k quantization.md similarity index 100% rename from github-data/pull_requests/11-Faster iq3_k and iq5_k quantization.md rename to github-data/pull_requests/11 - Faster iq3_k and iq5_k quantization.md diff --git a/github-data/pull_requests/110-Bitnet_ use the fused mul-silu in the FFN network.md b/github-data/pull_requests/110 - Bitnet_ use the fused mul-silu in the FFN network.md similarity index 100% rename from github-data/pull_requests/110-Bitnet_ use the fused mul-silu in the FFN network.md rename to github-data/pull_requests/110 - Bitnet_ use the fused mul-silu in the FFN network.md diff --git a/github-data/pull_requests/111-Use fused mul - unary op also for MoE models.md b/github-data/pull_requests/111 - Use fused mul - unary op also for MoE models.md similarity index 100% rename from github-data/pull_requests/111-Use fused mul - unary op also for MoE models.md rename to github-data/pull_requests/111 - Use fused mul - unary op also for MoE models.md diff --git a/github-data/pull_requests/112-Faster MoE inference.md b/github-data/pull_requests/112 - Faster MoE inference.md similarity index 100% rename from github-data/pull_requests/112-Faster MoE inference.md rename to github-data/pull_requests/112 - Faster MoE inference.md diff --git a/github-data/pull_requests/113-Trellis quantization.md b/github-data/pull_requests/113 - Trellis quantization.md similarity index 100% rename from github-data/pull_requests/113-Trellis quantization.md rename to github-data/pull_requests/113 - Trellis quantization.md diff --git a/github-data/pull_requests/114-MMQ Kernel for Q6_0 (pretty please!).md b/github-data/pull_requests/114 - MMQ Kernel for Q6_0 _pretty please_.md similarity index 100% rename from github-data/pull_requests/114-MMQ Kernel for Q6_0 (pretty please!).md rename to github-data/pull_requests/114 - MMQ Kernel for Q6_0 _pretty please_.md diff --git a/github-data/pull_requests/115-MMQ for Q6_0.md b/github-data/pull_requests/115 - MMQ for Q6_0.md similarity index 100% rename from github-data/pull_requests/115-MMQ for Q6_0.md rename to github-data/pull_requests/115 - MMQ for Q6_0.md diff --git a/github-data/pull_requests/116-Use Q6_0 instead of Q5_1 for tensors incompatible with IQ5_K_Q5_K.md b/github-data/pull_requests/116 - Use Q6_0 instead of Q5_1 for tensors incompatible with IQ5_K_Q5_K.md similarity index 100% rename from github-data/pull_requests/116-Use Q6_0 instead of Q5_1 for tensors incompatible with IQ5_K_Q5_K.md rename to github-data/pull_requests/116 - Use Q6_0 instead of Q5_1 for tensors incompatible with IQ5_K_Q5_K.md diff --git a/github-data/pull_requests/117-Some minor quant strategies tweaks.md b/github-data/pull_requests/117 - Some minor quant strategies tweaks.md similarity index 100% rename from github-data/pull_requests/117-Some minor quant strategies tweaks.md rename to github-data/pull_requests/117 - Some minor quant strategies tweaks.md diff --git a/github-data/pull_requests/118-IQ4_NL_X4.md b/github-data/pull_requests/118 - IQ4_NL_X4.md similarity index 100% rename from github-data/pull_requests/118-IQ4_NL_X4.md rename to github-data/pull_requests/118 - IQ4_NL_X4.md diff --git a/github-data/pull_requests/119-Q4_0_R4.md b/github-data/pull_requests/119 - Q4_0_R4.md similarity index 100% rename from github-data/pull_requests/119-Q4_0_R4.md rename to github-data/pull_requests/119 - Q4_0_R4.md diff --git a/github-data/pull_requests/12-q2_K_ allow it to detect ternary nets and quantize accordingly.md b/github-data/pull_requests/12 - q2_K_ allow it to detect ternary nets and quantize accordingly.md similarity index 100% rename from github-data/pull_requests/12-q2_K_ allow it to detect ternary nets and quantize accordingly.md rename to github-data/pull_requests/12 - q2_K_ allow it to detect ternary nets and quantize accordingly.md diff --git a/github-data/pull_requests/120-Q8_0_R4.md b/github-data/pull_requests/120 - Q8_0_R4.md similarity index 100% rename from github-data/pull_requests/120-Q8_0_R4.md rename to github-data/pull_requests/120 - Q8_0_R4.md diff --git a/github-data/pull_requests/121-Q5_0_R4.md b/github-data/pull_requests/121 - Q5_0_R4.md similarity index 100% rename from github-data/pull_requests/121-Q5_0_R4.md rename to github-data/pull_requests/121 - Q5_0_R4.md diff --git a/github-data/pull_requests/122-Q6_0_R4.md b/github-data/pull_requests/122 - Q6_0_R4.md similarity index 100% rename from github-data/pull_requests/122-Q6_0_R4.md rename to github-data/pull_requests/122 - Q6_0_R4.md diff --git a/github-data/pull_requests/123-IQ4_XS_R4.md b/github-data/pull_requests/123 - IQ4_XS_R4.md similarity index 100% rename from github-data/pull_requests/123-IQ4_XS_R4.md rename to github-data/pull_requests/123 - IQ4_XS_R4.md diff --git a/github-data/pull_requests/124-iq2_bn_r4_ fastest Bitnet CPU implementation on the planet.md b/github-data/pull_requests/124 - iq2_bn_r4_ fastest Bitnet CPU implementation on the planet.md similarity index 100% rename from github-data/pull_requests/124-iq2_bn_r4_ fastest Bitnet CPU implementation on the planet.md rename to github-data/pull_requests/124 - iq2_bn_r4_ fastest Bitnet CPU implementation on the planet.md diff --git a/github-data/pull_requests/125-R4 improvements on ARM_NEON.md b/github-data/pull_requests/125 - R4 improvements on ARM_NEON.md similarity index 100% rename from github-data/pull_requests/125-R4 improvements on ARM_NEON.md rename to github-data/pull_requests/125 - R4 improvements on ARM_NEON.md diff --git a/github-data/pull_requests/126-Rename iq4_nl_x4 to iq4_nl_r4.md b/github-data/pull_requests/126 - Rename iq4_nl_x4 to iq4_nl_r4.md similarity index 100% rename from github-data/pull_requests/126-Rename iq4_nl_x4 to iq4_nl_r4.md rename to github-data/pull_requests/126 - Rename iq4_nl_x4 to iq4_nl_r4.md diff --git a/github-data/pull_requests/127-Q4_0_R4 on CUDA.md b/github-data/pull_requests/127 - Q4_0_R4 on CUDA.md similarity index 100% rename from github-data/pull_requests/127-Q4_0_R4 on CUDA.md rename to github-data/pull_requests/127 - Q4_0_R4 on CUDA.md diff --git a/github-data/pull_requests/128-Faster IQ4_XS_R4 on Zen4.md b/github-data/pull_requests/128 - Faster IQ4_XS_R4 on Zen4.md similarity index 100% rename from github-data/pull_requests/128-Faster IQ4_XS_R4 on Zen4.md rename to github-data/pull_requests/128 - Faster IQ4_XS_R4 on Zen4.md diff --git a/github-data/pull_requests/129-Q4_K_R4.md b/github-data/pull_requests/129 - Q4_K_R4.md similarity index 100% rename from github-data/pull_requests/129-Q4_K_R4.md rename to github-data/pull_requests/129 - Q4_K_R4.md diff --git a/github-data/pull_requests/13-Adding IQ2_TN for use with ternary models.md b/github-data/pull_requests/13 - Adding IQ2_TN for use with ternary models.md similarity index 100% rename from github-data/pull_requests/13-Adding IQ2_TN for use with ternary models.md rename to github-data/pull_requests/13 - Adding IQ2_TN for use with ternary models.md diff --git a/github-data/pull_requests/130-Q6_K_R4.md b/github-data/pull_requests/130 - Q6_K_R4.md similarity index 100% rename from github-data/pull_requests/130-Q6_K_R4.md rename to github-data/pull_requests/130 - Q6_K_R4.md diff --git a/github-data/pull_requests/131-Slightly faster Q4_K_R4 and IQ4_XS_R4 on Zen4.md b/github-data/pull_requests/131 - Slightly faster Q4_K_R4 and IQ4_XS_R4 on Zen4.md similarity index 100% rename from github-data/pull_requests/131-Slightly faster Q4_K_R4 and IQ4_XS_R4 on Zen4.md rename to github-data/pull_requests/131 - Slightly faster Q4_K_R4 and IQ4_XS_R4 on Zen4.md diff --git a/github-data/pull_requests/132-Q5_K_R4.md b/github-data/pull_requests/132 - Q5_K_R4.md similarity index 100% rename from github-data/pull_requests/132-Q5_K_R4.md rename to github-data/pull_requests/132 - Q5_K_R4.md diff --git a/github-data/pull_requests/134-Q3_K_R4.md b/github-data/pull_requests/134 - Q3_K_R4.md similarity index 100% rename from github-data/pull_requests/134-Q3_K_R4.md rename to github-data/pull_requests/134 - Q3_K_R4.md diff --git a/github-data/pull_requests/135-Better ARM_NEON implementation for R4 quants.md b/github-data/pull_requests/135 - Better ARM_NEON implementation for R4 quants.md similarity index 100% rename from github-data/pull_requests/135-Better ARM_NEON implementation for R4 quants.md rename to github-data/pull_requests/135 - Better ARM_NEON implementation for R4 quants.md diff --git a/github-data/pull_requests/136-Q2_K_R4.md b/github-data/pull_requests/136 - Q2_K_R4.md similarity index 100% rename from github-data/pull_requests/136-Q2_K_R4.md rename to github-data/pull_requests/136 - Q2_K_R4.md diff --git a/github-data/pull_requests/137-Fix AVX2 implementation of iq4_nl_r4.md b/github-data/pull_requests/137 - Fix AVX2 implementation of iq4_nl_r4.md similarity index 100% rename from github-data/pull_requests/137-Fix AVX2 implementation of iq4_nl_r4.md rename to github-data/pull_requests/137 - Fix AVX2 implementation of iq4_nl_r4.md diff --git a/github-data/pull_requests/138-IQ4_K_R4.md b/github-data/pull_requests/138 - IQ4_K_R4.md similarity index 100% rename from github-data/pull_requests/138-IQ4_K_R4.md rename to github-data/pull_requests/138 - IQ4_K_R4.md diff --git a/github-data/pull_requests/139-Faster R4 quants on Zen4.md b/github-data/pull_requests/139 - Faster R4 quants on Zen4.md similarity index 100% rename from github-data/pull_requests/139-Faster R4 quants on Zen4.md rename to github-data/pull_requests/139 - Faster R4 quants on Zen4.md diff --git a/github-data/pull_requests/14-Adding IQ6_K.md b/github-data/pull_requests/14 - Adding IQ6_K.md similarity index 100% rename from github-data/pull_requests/14-Adding IQ6_K.md rename to github-data/pull_requests/14 - Adding IQ6_K.md diff --git a/github-data/pull_requests/141-Q8_K_R8_ Fastest quantized matrix multiplications.md b/github-data/pull_requests/141 - Q8_K_R8_ Fastest quantized matrix multiplications.md similarity index 100% rename from github-data/pull_requests/141-Q8_K_R8_ Fastest quantized matrix multiplications.md rename to github-data/pull_requests/141 - Q8_K_R8_ Fastest quantized matrix multiplications.md diff --git a/github-data/pull_requests/142-BF16_R16 - 16 interleaved bf16 rows .md b/github-data/pull_requests/142 - BF16_R16 - 16 interleaved bf16 rows.md similarity index 100% rename from github-data/pull_requests/142-BF16_R16 - 16 interleaved bf16 rows .md rename to github-data/pull_requests/142 - BF16_R16 - 16 interleaved bf16 rows.md diff --git a/github-data/pull_requests/143-Slightly faster IQ4_XS_R4 on AVX2.md b/github-data/pull_requests/143 - Slightly faster IQ4_XS_R4 on AVX2.md similarity index 100% rename from github-data/pull_requests/143-Slightly faster IQ4_XS_R4 on AVX2.md rename to github-data/pull_requests/143 - Slightly faster IQ4_XS_R4 on AVX2.md diff --git a/github-data/pull_requests/144-Slightly faster IQ4_K_R4 on AVX2_Zen4.md b/github-data/pull_requests/144 - Slightly faster IQ4_K_R4 on AVX2_Zen4.md similarity index 100% rename from github-data/pull_requests/144-Slightly faster IQ4_K_R4 on AVX2_Zen4.md rename to github-data/pull_requests/144 - Slightly faster IQ4_K_R4 on AVX2_Zen4.md diff --git a/github-data/pull_requests/145-IQ3_K_R4.md b/github-data/pull_requests/145 - IQ3_K_R4.md similarity index 100% rename from github-data/pull_requests/145-IQ3_K_R4.md rename to github-data/pull_requests/145 - IQ3_K_R4.md diff --git a/github-data/pull_requests/146-IQ2_K_R4.md b/github-data/pull_requests/146 - IQ2_K_R4.md similarity index 100% rename from github-data/pull_requests/146-IQ2_K_R4.md rename to github-data/pull_requests/146 - IQ2_K_R4.md diff --git a/github-data/pull_requests/147-Be able to repack tensors at run time.md b/github-data/pull_requests/147 - Be able to repack tensors at run time.md similarity index 100% rename from github-data/pull_requests/147-Be able to repack tensors at run time.md rename to github-data/pull_requests/147 - Be able to repack tensors at run time.md diff --git a/github-data/pull_requests/148-Slightly better matrix x vector on Zen4_AVX2 for iq2_k_r4, iq3_k_r4, iq4_k_r4.md b/github-data/pull_requests/148 - Slightly better matrix x vector on Zen4_AVX2 for iq2_k_r4_ iq3_k_r4_ iq.md similarity index 100% rename from github-data/pull_requests/148-Slightly better matrix x vector on Zen4_AVX2 for iq2_k_r4, iq3_k_r4, iq4_k_r4.md rename to github-data/pull_requests/148 - Slightly better matrix x vector on Zen4_AVX2 for iq2_k_r4_ iq3_k_r4_ iq.md diff --git a/github-data/pull_requests/149-IQ5_K_R4.md b/github-data/pull_requests/149 - IQ5_K_R4.md similarity index 100% rename from github-data/pull_requests/149-IQ5_K_R4.md rename to github-data/pull_requests/149 - IQ5_K_R4.md diff --git a/github-data/pull_requests/150-IQ4_KS_R4.md b/github-data/pull_requests/150 - IQ4_KS_R4.md similarity index 100% rename from github-data/pull_requests/150-IQ4_KS_R4.md rename to github-data/pull_requests/150 - IQ4_KS_R4.md diff --git a/github-data/pull_requests/151-fix typo.md b/github-data/pull_requests/151 - fix typo.md similarity index 100% rename from github-data/pull_requests/151-fix typo.md rename to github-data/pull_requests/151 - fix typo.md diff --git a/github-data/pull_requests/152-IQ3_XXS_R4.md b/github-data/pull_requests/152 - IQ3_XXS_R4.md similarity index 100% rename from github-data/pull_requests/152-IQ3_XXS_R4.md rename to github-data/pull_requests/152 - IQ3_XXS_R4.md diff --git a/github-data/pull_requests/153-IQ3_XXS_R4.md b/github-data/pull_requests/153 - IQ3_XXS_R4.md similarity index 100% rename from github-data/pull_requests/153-IQ3_XXS_R4.md rename to github-data/pull_requests/153 - IQ3_XXS_R4.md diff --git a/github-data/pull_requests/154-IQ2_XXS_R4.md b/github-data/pull_requests/154 - IQ2_XXS_R4.md similarity index 100% rename from github-data/pull_requests/154-IQ2_XXS_R4.md rename to github-data/pull_requests/154 - IQ2_XXS_R4.md diff --git a/github-data/pull_requests/155-IQ2_XS_R4.md b/github-data/pull_requests/155 - IQ2_XS_R4.md similarity index 100% rename from github-data/pull_requests/155-IQ2_XS_R4.md rename to github-data/pull_requests/155 - IQ2_XS_R4.md diff --git a/github-data/pull_requests/156-IQ2_S_R4.md b/github-data/pull_requests/156 - IQ2_S_R4.md similarity index 100% rename from github-data/pull_requests/156-IQ2_S_R4.md rename to github-data/pull_requests/156 - IQ2_S_R4.md diff --git a/github-data/pull_requests/157-R4 i-quants improvements.md b/github-data/pull_requests/157 - R4 i-quants improvements.md similarity index 100% rename from github-data/pull_requests/157-R4 i-quants improvements.md rename to github-data/pull_requests/157 - R4 i-quants improvements.md diff --git a/github-data/pull_requests/158-Faster R4 legacy quants.md b/github-data/pull_requests/158 - Faster R4 legacy quants.md similarity index 100% rename from github-data/pull_requests/158-Faster R4 legacy quants.md rename to github-data/pull_requests/158 - Faster R4 legacy quants.md diff --git a/github-data/pull_requests/16-Fix Makefile.md b/github-data/pull_requests/16 - Fix Makefile.md similarity index 100% rename from github-data/pull_requests/16-Fix Makefile.md rename to github-data/pull_requests/16 - Fix Makefile.md diff --git a/github-data/pull_requests/161-MSVC fixes.md b/github-data/pull_requests/161 - MSVC fixes.md similarity index 100% rename from github-data/pull_requests/161-MSVC fixes.md rename to github-data/pull_requests/161 - MSVC fixes.md diff --git a/github-data/pull_requests/162-IQ3_S_R4.md b/github-data/pull_requests/162 - IQ3_S_R4.md similarity index 100% rename from github-data/pull_requests/162-IQ3_S_R4.md rename to github-data/pull_requests/162 - IQ3_S_R4.md diff --git a/github-data/pull_requests/163-q4_0_r4_ Use AVX2 version for matrix x vector.md b/github-data/pull_requests/163 - q4_0_r4_ Use AVX2 version for matrix x vector.md similarity index 100% rename from github-data/pull_requests/163-q4_0_r4_ Use AVX2 version for matrix x vector.md rename to github-data/pull_requests/163 - q4_0_r4_ Use AVX2 version for matrix x vector.md diff --git a/github-data/pull_requests/168-Falcon3 changes.md b/github-data/pull_requests/168 - Falcon3 changes.md similarity index 100% rename from github-data/pull_requests/168-Falcon3 changes.md rename to github-data/pull_requests/168 - Falcon3 changes.md diff --git a/github-data/pull_requests/169-Be able to re-quantize MS BitNet I2_S models.md b/github-data/pull_requests/169 - Be able to re-quantize MS BitNet I2_S models.md similarity index 100% rename from github-data/pull_requests/169-Be able to re-quantize MS BitNet I2_S models.md rename to github-data/pull_requests/169 - Be able to re-quantize MS BitNet I2_S models.md diff --git a/github-data/pull_requests/17-Merge mainline - Aug 12 2024.md b/github-data/pull_requests/17 - Merge mainline - Aug 12 2024.md similarity index 100% rename from github-data/pull_requests/17-Merge mainline - Aug 12 2024.md rename to github-data/pull_requests/17 - Merge mainline - Aug 12 2024.md diff --git a/github-data/pull_requests/170-MoE fix for R4 quants.md b/github-data/pull_requests/170 - MoE fix for R4 quants.md similarity index 100% rename from github-data/pull_requests/170-MoE fix for R4 quants.md rename to github-data/pull_requests/170 - MoE fix for R4 quants.md diff --git a/github-data/pull_requests/171-Fix lower FA performance for even batch sizes.md b/github-data/pull_requests/171 - Fix lower FA performance for even batch sizes.md similarity index 100% rename from github-data/pull_requests/171-Fix lower FA performance for even batch sizes.md rename to github-data/pull_requests/171 - Fix lower FA performance for even batch sizes.md diff --git a/github-data/pull_requests/172-CPU Flash Attention improvements.md b/github-data/pull_requests/172 - CPU Flash Attention improvements.md similarity index 100% rename from github-data/pull_requests/172-CPU Flash Attention improvements.md rename to github-data/pull_requests/172 - CPU Flash Attention improvements.md diff --git a/github-data/pull_requests/173-More Flash Attention improvements.md b/github-data/pull_requests/173 - More Flash Attention improvements.md similarity index 100% rename from github-data/pull_requests/173-More Flash Attention improvements.md rename to github-data/pull_requests/173 - More Flash Attention improvements.md diff --git a/github-data/pull_requests/174-On Zen4 repack fp16 models to bf16_r16.md b/github-data/pull_requests/174 - On Zen4 repack fp16 models to bf16_r16.md similarity index 100% rename from github-data/pull_requests/174-On Zen4 repack fp16 models to bf16_r16.md rename to github-data/pull_requests/174 - On Zen4 repack fp16 models to bf16_r16.md diff --git a/github-data/pull_requests/175-Better BF16 support on AVX2.md b/github-data/pull_requests/175 - Better BF16 support on AVX2.md similarity index 100% rename from github-data/pull_requests/175-Better BF16 support on AVX2.md rename to github-data/pull_requests/175 - Better BF16 support on AVX2.md diff --git a/github-data/pull_requests/176-Deepseek V3 support added.md b/github-data/pull_requests/176 - Deepseek V3 support added.md similarity index 100% rename from github-data/pull_requests/176-Deepseek V3 support added.md rename to github-data/pull_requests/176 - Deepseek V3 support added.md diff --git a/github-data/pull_requests/177-Update chat templates.md b/github-data/pull_requests/177 - Update chat templates.md similarity index 100% rename from github-data/pull_requests/177-Update chat templates.md rename to github-data/pull_requests/177 - Update chat templates.md diff --git a/github-data/pull_requests/178-Interleave 8 rows (Q8_0, IQ4_XS).md b/github-data/pull_requests/178 - Interleave 8 rows _Q8_0_ IQ4_XS_.md similarity index 100% rename from github-data/pull_requests/178-Interleave 8 rows (Q8_0, IQ4_XS).md rename to github-data/pull_requests/178 - Interleave 8 rows _Q8_0_ IQ4_XS_.md diff --git a/github-data/pull_requests/179-Minor performance improvements.md b/github-data/pull_requests/179 - Minor performance improvements.md similarity index 100% rename from github-data/pull_requests/179-Minor performance improvements.md rename to github-data/pull_requests/179 - Minor performance improvements.md diff --git a/github-data/pull_requests/180-Deepseek MLA Optimizations.md b/github-data/pull_requests/180 - Deepseek MLA Optimizations.md similarity index 100% rename from github-data/pull_requests/180-Deepseek MLA Optimizations.md rename to github-data/pull_requests/180 - Deepseek MLA Optimizations.md diff --git a/github-data/pull_requests/181-Various.md b/github-data/pull_requests/181 - Various.md similarity index 100% rename from github-data/pull_requests/181-Various.md rename to github-data/pull_requests/181 - Various.md diff --git a/github-data/pull_requests/182-Faster Q4_K_R4 and Q5_K_R4 on AVX2_Zen4.md b/github-data/pull_requests/182 - Faster Q4_K_R4 and Q5_K_R4 on AVX2_Zen4.md similarity index 100% rename from github-data/pull_requests/182-Faster Q4_K_R4 and Q5_K_R4 on AVX2_Zen4.md rename to github-data/pull_requests/182 - Faster Q4_K_R4 and Q5_K_R4 on AVX2_Zen4.md diff --git a/github-data/pull_requests/184-Deepseek-Lite.md b/github-data/pull_requests/184 - Deepseek-Lite.md similarity index 100% rename from github-data/pull_requests/184-Deepseek-Lite.md rename to github-data/pull_requests/184 - Deepseek-Lite.md diff --git a/github-data/pull_requests/185-IQ1_S_R4_ better 1.5 bpw quants.md b/github-data/pull_requests/185 - IQ1_S_R4_ better 1.5 bpw quants.md similarity index 100% rename from github-data/pull_requests/185-IQ1_S_R4_ better 1.5 bpw quants.md rename to github-data/pull_requests/185 - IQ1_S_R4_ better 1.5 bpw quants.md diff --git a/github-data/pull_requests/186-iq1_s_r4_ slightly faster NEON gemm_gemv.md b/github-data/pull_requests/186 - iq1_s_r4_ slightly faster NEON gemm_gemv.md similarity index 100% rename from github-data/pull_requests/186-iq1_s_r4_ slightly faster NEON gemm_gemv.md rename to github-data/pull_requests/186 - iq1_s_r4_ slightly faster NEON gemm_gemv.md diff --git a/github-data/pull_requests/187-IQ1_M_R4_ better 1.75 bpw quants.md b/github-data/pull_requests/187 - IQ1_M_R4_ better 1.75 bpw quants.md similarity index 100% rename from github-data/pull_requests/187-IQ1_M_R4_ better 1.75 bpw quants.md rename to github-data/pull_requests/187 - IQ1_M_R4_ better 1.75 bpw quants.md diff --git a/github-data/pull_requests/188-Add optional MLA.md b/github-data/pull_requests/188 - Add optional MLA.md similarity index 100% rename from github-data/pull_requests/188-Add optional MLA.md rename to github-data/pull_requests/188 - Add optional MLA.md diff --git a/github-data/pull_requests/189-Rename q4_0_r4, q8_0_r4 and iq4_xs_r4 to _r8.md b/github-data/pull_requests/189 - Rename q4_0_r4_ q8_0_r4 and iq4_xs_r4 to _r8.md similarity index 100% rename from github-data/pull_requests/189-Rename q4_0_r4, q8_0_r4 and iq4_xs_r4 to _r8.md rename to github-data/pull_requests/189 - Rename q4_0_r4_ q8_0_r4 and iq4_xs_r4 to _r8.md diff --git a/github-data/pull_requests/19-Skip barriers of noops.md b/github-data/pull_requests/19 - Skip barriers of noops.md similarity index 100% rename from github-data/pull_requests/19-Skip barriers of noops.md rename to github-data/pull_requests/19 - Skip barriers of noops.md diff --git a/github-data/pull_requests/190-cuda_ non-contiguous rms norm.md b/github-data/pull_requests/190 - cuda_ non-contiguous rms norm.md similarity index 100% rename from github-data/pull_requests/190-cuda_ non-contiguous rms norm.md rename to github-data/pull_requests/190 - cuda_ non-contiguous rms norm.md diff --git a/github-data/pull_requests/191-Add additional checks for iq1_s_r4 quantization.md b/github-data/pull_requests/191 - Add additional checks for iq1_s_r4 quantization.md similarity index 100% rename from github-data/pull_requests/191-Add additional checks for iq1_s_r4 quantization.md rename to github-data/pull_requests/191 - Add additional checks for iq1_s_r4 quantization.md diff --git a/github-data/pull_requests/192-Revert #79.md b/github-data/pull_requests/192 - Revert _79.md similarity index 100% rename from github-data/pull_requests/192-Revert #79.md rename to github-data/pull_requests/192 - Revert _79.md diff --git a/github-data/pull_requests/193-RPC sync.md b/github-data/pull_requests/193 - RPC sync.md similarity index 100% rename from github-data/pull_requests/193-RPC sync.md rename to github-data/pull_requests/193 - RPC sync.md diff --git a/github-data/pull_requests/194-Use Q8_K_128 for IQ1_S_R4 and IQ1_M_R4 matrix multiplications.md b/github-data/pull_requests/194 - Use Q8_K_128 for IQ1_S_R4 and IQ1_M_R4 matrix multiplications.md similarity index 100% rename from github-data/pull_requests/194-Use Q8_K_128 for IQ1_S_R4 and IQ1_M_R4 matrix multiplications.md rename to github-data/pull_requests/194 - Use Q8_K_128 for IQ1_S_R4 and IQ1_M_R4 matrix multiplications.md diff --git a/github-data/pull_requests/195- Deepseek MLA Optimizations V2.md b/github-data/pull_requests/195 - Deepseek MLA Optimizations V2.md similarity index 100% rename from github-data/pull_requests/195- Deepseek MLA Optimizations V2.md rename to github-data/pull_requests/195 - Deepseek MLA Optimizations V2.md diff --git a/github-data/pull_requests/197-FA_ Add option to build all FA kernels.md b/github-data/pull_requests/197 - FA_ Add option to build all FA kernels.md similarity index 100% rename from github-data/pull_requests/197-FA_ Add option to build all FA kernels.md rename to github-data/pull_requests/197 - FA_ Add option to build all FA kernels.md diff --git a/github-data/pull_requests/198- Load all MoE experts during warmup and make warmup 1 token.md b/github-data/pull_requests/198 - Load all MoE experts during warmup and make warmup 1 token.md similarity index 100% rename from github-data/pull_requests/198- Load all MoE experts during warmup and make warmup 1 token.md rename to github-data/pull_requests/198 - Load all MoE experts during warmup and make warmup 1 token.md diff --git a/github-data/pull_requests/2-Offload Bitnet token embeddings to the GPU - the right way.md b/github-data/pull_requests/2 - Offload Bitnet token embeddings to the GPU - the right way.md similarity index 100% rename from github-data/pull_requests/2-Offload Bitnet token embeddings to the GPU - the right way.md rename to github-data/pull_requests/2 - Offload Bitnet token embeddings to the GPU - the right way.md diff --git a/github-data/pull_requests/20-iq2_k_ slightly better bpw - accuracy compromise.md b/github-data/pull_requests/20 - iq2_k_ slightly better bpw - accuracy compromise.md similarity index 100% rename from github-data/pull_requests/20-iq2_k_ slightly better bpw - accuracy compromise.md rename to github-data/pull_requests/20 - iq2_k_ slightly better bpw - accuracy compromise.md diff --git a/github-data/pull_requests/200-DeepSeek FA support (CPU only).md b/github-data/pull_requests/200 - DeepSeek FA support _CPU only_.md similarity index 100% rename from github-data/pull_requests/200-DeepSeek FA support (CPU only).md rename to github-data/pull_requests/200 - DeepSeek FA support _CPU only_.md diff --git a/github-data/pull_requests/202-Fix imatrix overprotectiveness.md b/github-data/pull_requests/202 - Fix imatrix overprotectiveness.md similarity index 100% rename from github-data/pull_requests/202-Fix imatrix overprotectiveness.md rename to github-data/pull_requests/202 - Fix imatrix overprotectiveness.md diff --git a/github-data/pull_requests/204-Fix iqk_mul_mat on AVX512 systems that are missing BF16 support.md b/github-data/pull_requests/204 - Fix iqk_mul_mat on AVX512 systems that are missing BF16 support.md similarity index 100% rename from github-data/pull_requests/204-Fix iqk_mul_mat on AVX512 systems that are missing BF16 support.md rename to github-data/pull_requests/204 - Fix iqk_mul_mat on AVX512 systems that are missing BF16 support.md diff --git a/github-data/pull_requests/205-Faster MLA prompt processing.md b/github-data/pull_requests/205 - Faster MLA prompt processing.md similarity index 100% rename from github-data/pull_requests/205-Faster MLA prompt processing.md rename to github-data/pull_requests/205 - Faster MLA prompt processing.md diff --git a/github-data/pull_requests/206-MLA_ allow Q8_0 K-cache for MLA.md b/github-data/pull_requests/206 - MLA_ allow Q8_0 K-cache for MLA.md similarity index 100% rename from github-data/pull_requests/206-MLA_ allow Q8_0 K-cache for MLA.md rename to github-data/pull_requests/206 - MLA_ allow Q8_0 K-cache for MLA.md diff --git a/github-data/pull_requests/207-Faster CPU TG for GQA models.md b/github-data/pull_requests/207 - Faster CPU TG for GQA models.md similarity index 100% rename from github-data/pull_requests/207-Faster CPU TG for GQA models.md rename to github-data/pull_requests/207 - Faster CPU TG for GQA models.md diff --git a/github-data/pull_requests/208-Q8_KV_ 8-bit quantization type targeting the KV cache.md b/github-data/pull_requests/208 - Q8_KV_ 8-bit quantization type targeting the KV cache.md similarity index 100% rename from github-data/pull_requests/208-Q8_KV_ 8-bit quantization type targeting the KV cache.md rename to github-data/pull_requests/208 - Q8_KV_ 8-bit quantization type targeting the KV cache.md diff --git a/github-data/pull_requests/21-quantize_stats_ print rmse and max error as fraction of _x_.md b/github-data/pull_requests/21 - quantize_stats_ print rmse and max error as fraction of _x_.md similarity index 100% rename from github-data/pull_requests/21-quantize_stats_ print rmse and max error as fraction of _x_.md rename to github-data/pull_requests/21 - quantize_stats_ print rmse and max error as fraction of _x_.md diff --git a/github-data/pull_requests/210-Repack also experts.md b/github-data/pull_requests/210 - Repack also experts.md similarity index 100% rename from github-data/pull_requests/210-Repack also experts.md rename to github-data/pull_requests/210 - Repack also experts.md diff --git a/github-data/pull_requests/212-Optimized GEMM_GEMV for IQ1_S.md b/github-data/pull_requests/212 - Optimized GEMM_GEMV for IQ1_S.md similarity index 100% rename from github-data/pull_requests/212-Optimized GEMM_GEMV for IQ1_S.md rename to github-data/pull_requests/212 - Optimized GEMM_GEMV for IQ1_S.md diff --git a/github-data/pull_requests/213-Fix NEON gemm_gemv for legacy quants when row size is not divisible by 128.md b/github-data/pull_requests/213 - Fix NEON gemm_gemv for legacy quants when row size is not divisible by .md similarity index 100% rename from github-data/pull_requests/213-Fix NEON gemm_gemv for legacy quants when row size is not divisible by 128.md rename to github-data/pull_requests/213 - Fix NEON gemm_gemv for legacy quants when row size is not divisible by .md diff --git a/github-data/pull_requests/215-Trying to fix confusion betweem HAVE_FANCY_SIMD and AVX512.md b/github-data/pull_requests/215 - Trying to fix confusion betweem HAVE_FANCY_SIMD and AVX512.md similarity index 100% rename from github-data/pull_requests/215-Trying to fix confusion betweem HAVE_FANCY_SIMD and AVX512.md rename to github-data/pull_requests/215 - Trying to fix confusion betweem HAVE_FANCY_SIMD and AVX512.md diff --git a/github-data/pull_requests/216-Hopefully this really fixes the confusion between AVX512 and FANCY_SIMD.md b/github-data/pull_requests/216 - Hopefully this really fixes the confusion between AVX512 and FANCY_SIMD.md similarity index 100% rename from github-data/pull_requests/216-Hopefully this really fixes the confusion between AVX512 and FANCY_SIMD.md rename to github-data/pull_requests/216 - Hopefully this really fixes the confusion between AVX512 and FANCY_SIMD.md diff --git a/github-data/pull_requests/218-Better strategy for attention matrix multiplications when generating tokens .md b/github-data/pull_requests/218 - Better strategy for attention matrix multiplications when generating to.md similarity index 100% rename from github-data/pull_requests/218-Better strategy for attention matrix multiplications when generating tokens .md rename to github-data/pull_requests/218 - Better strategy for attention matrix multiplications when generating to.md diff --git a/github-data/pull_requests/219-Fuse MoE up and gate matrix multiplications.md b/github-data/pull_requests/219 - Fuse MoE up and gate matrix multiplications.md similarity index 100% rename from github-data/pull_requests/219-Fuse MoE up and gate matrix multiplications.md rename to github-data/pull_requests/219 - Fuse MoE up and gate matrix multiplications.md diff --git a/github-data/pull_requests/22-AVX2 quantization for Q8_K.md b/github-data/pull_requests/22 - AVX2 quantization for Q8_K.md similarity index 100% rename from github-data/pull_requests/22-AVX2 quantization for Q8_K.md rename to github-data/pull_requests/22 - AVX2 quantization for Q8_K.md diff --git a/github-data/pull_requests/220-Fix #217.md b/github-data/pull_requests/220 - Fix _217.md similarity index 100% rename from github-data/pull_requests/220-Fix #217.md rename to github-data/pull_requests/220 - Fix _217.md diff --git a/github-data/pull_requests/225- Examples _ Add new sweep-bench benchmark.md b/github-data/pull_requests/225 - Examples _ Add new sweep-bench benchmark.md similarity index 100% rename from github-data/pull_requests/225- Examples _ Add new sweep-bench benchmark.md rename to github-data/pull_requests/225 - Examples _ Add new sweep-bench benchmark.md diff --git a/github-data/pull_requests/226-Fix compilation error with IQK_FA_ALL_QUANTS enabled.md b/github-data/pull_requests/226 - Fix compilation error with IQK_FA_ALL_QUANTS enabled.md similarity index 100% rename from github-data/pull_requests/226-Fix compilation error with IQK_FA_ALL_QUANTS enabled.md rename to github-data/pull_requests/226 - Fix compilation error with IQK_FA_ALL_QUANTS enabled.md diff --git a/github-data/pull_requests/229-Fused MoE ffn_up and ffn_gate.md b/github-data/pull_requests/229 - Fused MoE ffn_up and ffn_gate.md similarity index 100% rename from github-data/pull_requests/229-Fused MoE ffn_up and ffn_gate.md rename to github-data/pull_requests/229 - Fused MoE ffn_up and ffn_gate.md diff --git a/github-data/pull_requests/23-iq4_k tweak.md b/github-data/pull_requests/23 - iq4_k tweak.md similarity index 100% rename from github-data/pull_requests/23-iq4_k tweak.md rename to github-data/pull_requests/23 - iq4_k tweak.md diff --git a/github-data/pull_requests/231-Fix #230.md b/github-data/pull_requests/231 - Fix _230.md similarity index 100% rename from github-data/pull_requests/231-Fix #230.md rename to github-data/pull_requests/231 - Fix _230.md diff --git a/github-data/pull_requests/232-Give the user the option to override where model weights are stored.md b/github-data/pull_requests/232 - Give the user the option to override where model weights are stored.md similarity index 100% rename from github-data/pull_requests/232-Give the user the option to override where model weights are stored.md rename to github-data/pull_requests/232 - Give the user the option to override where model weights are stored.md diff --git a/github-data/pull_requests/233-Slightly faster CUDA MLA.md b/github-data/pull_requests/233 - Slightly faster CUDA MLA.md similarity index 100% rename from github-data/pull_requests/233-Slightly faster CUDA MLA.md rename to github-data/pull_requests/233 - Slightly faster CUDA MLA.md diff --git a/github-data/pull_requests/234-Faster MLA on CUDA.md b/github-data/pull_requests/234 - Faster MLA on CUDA.md similarity index 100% rename from github-data/pull_requests/234-Faster MLA on CUDA.md rename to github-data/pull_requests/234 - Faster MLA on CUDA.md diff --git a/github-data/pull_requests/235-Option to use MLA without a transposed cache.md b/github-data/pull_requests/235 - Option to use MLA without a transposed cache.md similarity index 100% rename from github-data/pull_requests/235-Option to use MLA without a transposed cache.md rename to github-data/pull_requests/235 - Option to use MLA without a transposed cache.md diff --git a/github-data/pull_requests/236-Feat_lock free server.md b/github-data/pull_requests/236 - Feat_lock free server.md similarity index 100% rename from github-data/pull_requests/236-Feat_lock free server.md rename to github-data/pull_requests/236 - Feat_lock free server.md diff --git a/github-data/pull_requests/237-Reduce size of compute buffers.md b/github-data/pull_requests/237 - Reduce size of compute buffers.md similarity index 100% rename from github-data/pull_requests/237-Reduce size of compute buffers.md rename to github-data/pull_requests/237 - Reduce size of compute buffers.md diff --git a/github-data/pull_requests/238-A better way to measure the cost of ggml_barrier.md b/github-data/pull_requests/238 - A better way to measure the cost of ggml_barrier.md similarity index 100% rename from github-data/pull_requests/238-A better way to measure the cost of ggml_barrier.md rename to github-data/pull_requests/238 - A better way to measure the cost of ggml_barrier.md diff --git a/github-data/pull_requests/239-SER - Smart Expert Reduction.md b/github-data/pull_requests/239 - SER - Smart Expert Reduction.md similarity index 100% rename from github-data/pull_requests/239-SER - Smart Expert Reduction.md rename to github-data/pull_requests/239 - SER - Smart Expert Reduction.md diff --git a/github-data/pull_requests/24-softcap_ minor improvement.md b/github-data/pull_requests/24 - softcap_ minor improvement.md similarity index 100% rename from github-data/pull_requests/24-softcap_ minor improvement.md rename to github-data/pull_requests/24 - softcap_ minor improvement.md diff --git a/github-data/pull_requests/240-Flash MLA (CPU only).md b/github-data/pull_requests/240 - Flash MLA _CPU only_.md similarity index 100% rename from github-data/pull_requests/240-Flash MLA (CPU only).md rename to github-data/pull_requests/240 - Flash MLA _CPU only_.md diff --git a/github-data/pull_requests/241-DeepSeek CUDA Flash Attention .md b/github-data/pull_requests/241 - DeepSeek CUDA Flash Attention.md similarity index 100% rename from github-data/pull_requests/241-DeepSeek CUDA Flash Attention .md rename to github-data/pull_requests/241 - DeepSeek CUDA Flash Attention.md diff --git a/github-data/pull_requests/243-Better FlashMLA.md b/github-data/pull_requests/243 - Better FlashMLA.md similarity index 100% rename from github-data/pull_requests/243-Better FlashMLA.md rename to github-data/pull_requests/243 - Better FlashMLA.md diff --git a/github-data/pull_requests/244-Custom quantization rules with regular expressions.md b/github-data/pull_requests/244 - Custom quantization rules with regular expressions.md similarity index 100% rename from github-data/pull_requests/244-Custom quantization rules with regular expressions.md rename to github-data/pull_requests/244 - Custom quantization rules with regular expressions.md diff --git a/github-data/pull_requests/246-Faster FlashMLA prompt processing.md b/github-data/pull_requests/246 - Faster FlashMLA prompt processing.md similarity index 100% rename from github-data/pull_requests/246-Faster FlashMLA prompt processing.md rename to github-data/pull_requests/246 - Faster FlashMLA prompt processing.md diff --git a/github-data/pull_requests/247-FlashMLA on CUDA.md b/github-data/pull_requests/247 - FlashMLA on CUDA.md similarity index 100% rename from github-data/pull_requests/247-FlashMLA on CUDA.md rename to github-data/pull_requests/247 - FlashMLA on CUDA.md diff --git a/github-data/pull_requests/248-Faster MoE token generation on CUDA.md b/github-data/pull_requests/248 - Faster MoE token generation on CUDA.md similarity index 100% rename from github-data/pull_requests/248-Faster MoE token generation on CUDA.md rename to github-data/pull_requests/248 - Faster MoE token generation on CUDA.md diff --git a/github-data/pull_requests/250-DeepSeek imatrix stuff.md b/github-data/pull_requests/250 - DeepSeek imatrix stuff.md similarity index 100% rename from github-data/pull_requests/250-DeepSeek imatrix stuff.md rename to github-data/pull_requests/250 - DeepSeek imatrix stuff.md diff --git a/github-data/pull_requests/251-Try using fp32 for FlashMLA.md b/github-data/pull_requests/251 - Try using fp32 for FlashMLA.md similarity index 100% rename from github-data/pull_requests/251-Try using fp32 for FlashMLA.md rename to github-data/pull_requests/251 - Try using fp32 for FlashMLA.md diff --git a/github-data/pull_requests/252-MLA-2_ Allow usage of q8_0 for KV cache on CUDA.md b/github-data/pull_requests/252 - MLA-2_ Allow usage of q8_0 for KV cache on CUDA.md similarity index 100% rename from github-data/pull_requests/252-MLA-2_ Allow usage of q8_0 for KV cache on CUDA.md rename to github-data/pull_requests/252 - MLA-2_ Allow usage of q8_0 for KV cache on CUDA.md diff --git a/github-data/pull_requests/253-FlashMLA-2 (CPU)_ faster and smaller compute buffer size.md b/github-data/pull_requests/253 - FlashMLA-2 _CPU_ faster and smaller compute buffer size.md similarity index 100% rename from github-data/pull_requests/253-FlashMLA-2 (CPU)_ faster and smaller compute buffer size.md rename to github-data/pull_requests/253 - FlashMLA-2 _CPU_ faster and smaller compute buffer size.md diff --git a/github-data/pull_requests/259-Prepare wk_b tensors of DeepSeek models on the fly.md b/github-data/pull_requests/259 - Prepare wk_b tensors of DeepSeek models on the fly.md similarity index 100% rename from github-data/pull_requests/259-Prepare wk_b tensors of DeepSeek models on the fly.md rename to github-data/pull_requests/259 - Prepare wk_b tensors of DeepSeek models on the fly.md diff --git a/github-data/pull_requests/260-FlashMLA-2_ reduce compute buffer size (CUDA and CPU).md b/github-data/pull_requests/260 - FlashMLA-2_ reduce compute buffer size _CUDA and CPU_.md similarity index 100% rename from github-data/pull_requests/260-FlashMLA-2_ reduce compute buffer size (CUDA and CPU).md rename to github-data/pull_requests/260 - FlashMLA-2_ reduce compute buffer size _CUDA and CPU_.md diff --git a/github-data/pull_requests/261-Compile time option to use bf16 for quants without MMQ kernels.md b/github-data/pull_requests/261 - Compile time option to use bf16 for quants without MMQ kernels.md similarity index 100% rename from github-data/pull_requests/261-Compile time option to use bf16 for quants without MMQ kernels.md rename to github-data/pull_requests/261 - Compile time option to use bf16 for quants without MMQ kernels.md diff --git a/github-data/pull_requests/262-Fix #261.md b/github-data/pull_requests/262 - Fix _261.md similarity index 100% rename from github-data/pull_requests/262-Fix #261.md rename to github-data/pull_requests/262 - Fix _261.md diff --git a/github-data/pull_requests/264-Make Q8_0 KV cache work with FlasMLA-2 on CUDA.md b/github-data/pull_requests/264 - Make Q8_0 KV cache work with FlasMLA-2 on CUDA.md similarity index 100% rename from github-data/pull_requests/264-Make Q8_0 KV cache work with FlasMLA-2 on CUDA.md rename to github-data/pull_requests/264 - Make Q8_0 KV cache work with FlasMLA-2 on CUDA.md diff --git a/github-data/pull_requests/265-Allow q8_0 cache on the CPU for FlashMLA-2.md b/github-data/pull_requests/265 - Allow q8_0 cache on the CPU for FlashMLA-2.md similarity index 100% rename from github-data/pull_requests/265-Allow q8_0 cache on the CPU for FlashMLA-2.md rename to github-data/pull_requests/265 - Allow q8_0 cache on the CPU for FlashMLA-2.md diff --git a/github-data/pull_requests/268-Prevent FlashMLA-1 from running on CUDA.md b/github-data/pull_requests/268 - Prevent FlashMLA-1 from running on CUDA.md similarity index 100% rename from github-data/pull_requests/268-Prevent FlashMLA-1 from running on CUDA.md rename to github-data/pull_requests/268 - Prevent FlashMLA-1 from running on CUDA.md diff --git a/github-data/pull_requests/269-Fix ggml_compute_forward_dup_q.md b/github-data/pull_requests/269 - Fix ggml_compute_forward_dup_q.md similarity index 100% rename from github-data/pull_requests/269-Fix ggml_compute_forward_dup_q.md rename to github-data/pull_requests/269 - Fix ggml_compute_forward_dup_q.md diff --git a/github-data/pull_requests/27-Faster Gemma2.md b/github-data/pull_requests/27 - Faster Gemma2.md similarity index 100% rename from github-data/pull_requests/27-Faster Gemma2.md rename to github-data/pull_requests/27 - Faster Gemma2.md diff --git a/github-data/pull_requests/270-Honor mmap setting when using tensor overrides.md b/github-data/pull_requests/270 - Honor mmap setting when using tensor overrides.md similarity index 100% rename from github-data/pull_requests/270-Honor mmap setting when using tensor overrides.md rename to github-data/pull_requests/270 - Honor mmap setting when using tensor overrides.md diff --git a/github-data/pull_requests/272-Convert models to row-interleaved quants using the quantize tool.md b/github-data/pull_requests/272 - Convert models to row-interleaved quants using the quantize tool.md similarity index 100% rename from github-data/pull_requests/272-Convert models to row-interleaved quants using the quantize tool.md rename to github-data/pull_requests/272 - Convert models to row-interleaved quants using the quantize tool.md diff --git a/github-data/pull_requests/273-FlashMLA-3_ the best of both worlds (CPU only).md b/github-data/pull_requests/273 - FlashMLA-3_ the best of both worlds _CPU only_.md similarity index 100% rename from github-data/pull_requests/273-FlashMLA-3_ the best of both worlds (CPU only).md rename to github-data/pull_requests/273 - FlashMLA-3_ the best of both worlds _CPU only_.md diff --git a/github-data/pull_requests/274-Specify tensor name regex for tensors to be repacked.md b/github-data/pull_requests/274 - Specify tensor name regex for tensors to be repacked.md similarity index 100% rename from github-data/pull_requests/274-Specify tensor name regex for tensors to be repacked.md rename to github-data/pull_requests/274 - Specify tensor name regex for tensors to be repacked.md diff --git a/github-data/pull_requests/275-Fix bug_ missing parentheses in logical expression.md b/github-data/pull_requests/275 - Fix bug_ missing parentheses in logical expression.md similarity index 100% rename from github-data/pull_requests/275-Fix bug_ missing parentheses in logical expression.md rename to github-data/pull_requests/275 - Fix bug_ missing parentheses in logical expression.md diff --git a/github-data/pull_requests/276-Add Gemma3 support (text only).md b/github-data/pull_requests/276 - Add Gemma3 support _text only_.md similarity index 100% rename from github-data/pull_requests/276-Add Gemma3 support (text only).md rename to github-data/pull_requests/276 - Add Gemma3 support _text only_.md diff --git a/github-data/pull_requests/277-Attempt to improve FlashMLA on the CPU.md b/github-data/pull_requests/277 - Attempt to improve FlashMLA on the CPU.md similarity index 100% rename from github-data/pull_requests/277-Attempt to improve FlashMLA on the CPU.md rename to github-data/pull_requests/277 - Attempt to improve FlashMLA on the CPU.md diff --git a/github-data/pull_requests/278-Test transparent huge pages on Linux.md b/github-data/pull_requests/278 - Test transparent huge pages on Linux.md similarity index 100% rename from github-data/pull_requests/278-Test transparent huge pages on Linux.md rename to github-data/pull_requests/278 - Test transparent huge pages on Linux.md diff --git a/github-data/pull_requests/279-Fighting with cmake.md b/github-data/pull_requests/279 - Fighting with cmake.md similarity index 100% rename from github-data/pull_requests/279-Fighting with cmake.md rename to github-data/pull_requests/279 - Fighting with cmake.md diff --git a/github-data/pull_requests/28-Binary KQ mask.md b/github-data/pull_requests/28 - Binary KQ mask.md similarity index 100% rename from github-data/pull_requests/28-Binary KQ mask.md rename to github-data/pull_requests/28 - Binary KQ mask.md diff --git a/github-data/pull_requests/280-Native build ooption for CUDA when GGML_NATIVE is set.md b/github-data/pull_requests/280 - Native build ooption for CUDA when GGML_NATIVE is set.md similarity index 100% rename from github-data/pull_requests/280-Native build ooption for CUDA when GGML_NATIVE is set.md rename to github-data/pull_requests/280 - Native build ooption for CUDA when GGML_NATIVE is set.md diff --git a/github-data/pull_requests/282-Improve DeepSeek batched processing speed.md b/github-data/pull_requests/282 - Improve DeepSeek batched processing speed.md similarity index 100% rename from github-data/pull_requests/282-Improve DeepSeek batched processing speed.md rename to github-data/pull_requests/282 - Improve DeepSeek batched processing speed.md diff --git a/github-data/pull_requests/283-CUDA_ better MoE implementation.md b/github-data/pull_requests/283 - CUDA_ better MoE implementation.md similarity index 100% rename from github-data/pull_requests/283-CUDA_ better MoE implementation.md rename to github-data/pull_requests/283 - CUDA_ better MoE implementation.md diff --git a/github-data/pull_requests/284-llama-bench_ enable having different number of threads for tg and pp.md b/github-data/pull_requests/284 - llama-bench_ enable having different number of threads for tg and pp.md similarity index 100% rename from github-data/pull_requests/284-llama-bench_ enable having different number of threads for tg and pp.md rename to github-data/pull_requests/284 - llama-bench_ enable having different number of threads for tg and pp.md diff --git a/github-data/pull_requests/287-Is this better for DeepSeek-R1_.md b/github-data/pull_requests/287 - Is this better for DeepSeek-R1_.md similarity index 100% rename from github-data/pull_requests/287-Is this better for DeepSeek-R1_.md rename to github-data/pull_requests/287 - Is this better for DeepSeek-R1_.md diff --git a/github-data/pull_requests/289-Update sweep bench (depracating .jsonl support).md b/github-data/pull_requests/289 - Update sweep bench _depracating .jsonl support_.md similarity index 100% rename from github-data/pull_requests/289-Update sweep bench (depracating .jsonl support).md rename to github-data/pull_requests/289 - Update sweep bench _depracating .jsonl support_.md diff --git a/github-data/pull_requests/290-mmap backed KV cache.md b/github-data/pull_requests/290 - mmap backed KV cache.md similarity index 100% rename from github-data/pull_requests/290-mmap backed KV cache.md rename to github-data/pull_requests/290 - mmap backed KV cache.md diff --git a/github-data/pull_requests/291-Disable Zen4 optimizations for Q8_0_Q8_0_R8.md b/github-data/pull_requests/291 - Disable Zen4 optimizations for Q8_0_Q8_0_R8.md similarity index 100% rename from github-data/pull_requests/291-Disable Zen4 optimizations for Q8_0_Q8_0_R8.md rename to github-data/pull_requests/291 - Disable Zen4 optimizations for Q8_0_Q8_0_R8.md diff --git a/github-data/pull_requests/292-Use bf16 instead of fp16 block scales for q8_1.md b/github-data/pull_requests/292 - Use bf16 instead of fp16 block scales for q8_1.md similarity index 100% rename from github-data/pull_requests/292-Use bf16 instead of fp16 block scales for q8_1.md rename to github-data/pull_requests/292 - Use bf16 instead of fp16 block scales for q8_1.md diff --git a/github-data/pull_requests/294-Make sure tensor row size is multiple of block size also when quantizing with --pure.md b/github-data/pull_requests/294 - Make sure tensor row size is multiple of block size also when quantizin.md similarity index 100% rename from github-data/pull_requests/294-Make sure tensor row size is multiple of block size also when quantizing with --pure.md rename to github-data/pull_requests/294 - Make sure tensor row size is multiple of block size also when quantizin.md diff --git a/github-data/pull_requests/295-Quantization improvements.md b/github-data/pull_requests/295 - Quantization improvements.md similarity index 100% rename from github-data/pull_requests/295-Quantization improvements.md rename to github-data/pull_requests/295 - Quantization improvements.md diff --git a/github-data/pull_requests/298-Update gguf-py constants.md b/github-data/pull_requests/298 - Update gguf-py constants.md similarity index 100% rename from github-data/pull_requests/298-Update gguf-py constants.md rename to github-data/pull_requests/298 - Update gguf-py constants.md diff --git a/github-data/pull_requests/299-Additional guards for interleaved quants.md b/github-data/pull_requests/299 - Additional guards for interleaved quants.md similarity index 100% rename from github-data/pull_requests/299-Additional guards for interleaved quants.md rename to github-data/pull_requests/299 - Additional guards for interleaved quants.md diff --git a/github-data/pull_requests/3-Merge mainline llama.cpp.md b/github-data/pull_requests/3 - Merge mainline llama.cpp.md similarity index 100% rename from github-data/pull_requests/3-Merge mainline llama.cpp.md rename to github-data/pull_requests/3 - Merge mainline llama.cpp.md diff --git a/github-data/pull_requests/301-Fix #300.md b/github-data/pull_requests/301 - Fix _300.md similarity index 100% rename from github-data/pull_requests/301-Fix #300.md rename to github-data/pull_requests/301 - Fix _300.md diff --git a/github-data/pull_requests/302-Quantization improvements (2).md b/github-data/pull_requests/302 - Quantization improvements _2_.md similarity index 100% rename from github-data/pull_requests/302-Quantization improvements (2).md rename to github-data/pull_requests/302 - Quantization improvements _2_.md diff --git a/github-data/pull_requests/303-Fix ARM_NEON build failure due to q8_2.md b/github-data/pull_requests/303 - Fix ARM_NEON build failure due to q8_2.md similarity index 100% rename from github-data/pull_requests/303-Fix ARM_NEON build failure due to q8_2.md rename to github-data/pull_requests/303 - Fix ARM_NEON build failure due to q8_2.md diff --git a/github-data/pull_requests/307-Metal_ much faster MoE prompt processing.md b/github-data/pull_requests/307 - Metal_ much faster MoE prompt processing.md similarity index 100% rename from github-data/pull_requests/307-Metal_ much faster MoE prompt processing.md rename to github-data/pull_requests/307 - Metal_ much faster MoE prompt processing.md diff --git a/github-data/pull_requests/309-Fix GCC compilation errors on ARM.md b/github-data/pull_requests/309 - Fix GCC compilation errors on ARM.md similarity index 100% rename from github-data/pull_requests/309-Fix GCC compilation errors on ARM.md rename to github-data/pull_requests/309 - Fix GCC compilation errors on ARM.md diff --git a/github-data/pull_requests/31-Fix build when iqk_mul_mat is disabled.md b/github-data/pull_requests/31 - Fix build when iqk_mul_mat is disabled.md similarity index 100% rename from github-data/pull_requests/31-Fix build when iqk_mul_mat is disabled.md rename to github-data/pull_requests/31 - Fix build when iqk_mul_mat is disabled.md diff --git a/github-data/pull_requests/310-Metal_ FA and FlashMLA.md b/github-data/pull_requests/310 - Metal_ FA and FlashMLA.md similarity index 100% rename from github-data/pull_requests/310-Metal_ FA and FlashMLA.md rename to github-data/pull_requests/310 - Metal_ FA and FlashMLA.md diff --git a/github-data/pull_requests/311-Add -flax-vector-conversions for GCC on ARM.md b/github-data/pull_requests/311 - Add -flax-vector-conversions for GCC on ARM.md similarity index 100% rename from github-data/pull_requests/311-Add -flax-vector-conversions for GCC on ARM.md rename to github-data/pull_requests/311 - Add -flax-vector-conversions for GCC on ARM.md diff --git a/github-data/pull_requests/312-Improved IQ2_XS quantization.md b/github-data/pull_requests/312 - Improved IQ2_XS quantization.md similarity index 100% rename from github-data/pull_requests/312-Improved IQ2_XS quantization.md rename to github-data/pull_requests/312 - Improved IQ2_XS quantization.md diff --git a/github-data/pull_requests/313-We need to synchronize before using device to host async memcpy.md b/github-data/pull_requests/313 - We need to synchronize before using device to host async memcpy.md similarity index 100% rename from github-data/pull_requests/313-We need to synchronize before using device to host async memcpy.md rename to github-data/pull_requests/313 - We need to synchronize before using device to host async memcpy.md diff --git a/github-data/pull_requests/315-Try not repacking q8_0 for FA computations.md b/github-data/pull_requests/315 - Try not repacking q8_0 for FA computations.md similarity index 100% rename from github-data/pull_requests/315-Try not repacking q8_0 for FA computations.md rename to github-data/pull_requests/315 - Try not repacking q8_0 for FA computations.md diff --git a/github-data/pull_requests/317-Add copyright notices.md b/github-data/pull_requests/317 - Add copyright notices.md similarity index 100% rename from github-data/pull_requests/317-Add copyright notices.md rename to github-data/pull_requests/317 - Add copyright notices.md diff --git a/github-data/pull_requests/318-Use links for ggml_llama.cpp authors.md b/github-data/pull_requests/318 - Use links for ggml_llama.cpp authors.md similarity index 100% rename from github-data/pull_requests/318-Use links for ggml_llama.cpp authors.md rename to github-data/pull_requests/318 - Use links for ggml_llama.cpp authors.md diff --git a/github-data/pull_requests/32-Zen4 Flash Attention.md b/github-data/pull_requests/32 - Zen4 Flash Attention.md similarity index 100% rename from github-data/pull_requests/32-Zen4 Flash Attention.md rename to github-data/pull_requests/32 - Zen4 Flash Attention.md diff --git a/github-data/pull_requests/320-Guard against attempts to use MLA for non-MLA models.md b/github-data/pull_requests/320 - Guard against attempts to use MLA for non-MLA models.md similarity index 100% rename from github-data/pull_requests/320-Guard against attempts to use MLA for non-MLA models.md rename to github-data/pull_requests/320 - Guard against attempts to use MLA for non-MLA models.md diff --git a/github-data/pull_requests/321-LlaMA-4 support (text only).md b/github-data/pull_requests/321 - LlaMA-4 support _text only_.md similarity index 100% rename from github-data/pull_requests/321-LlaMA-4 support (text only).md rename to github-data/pull_requests/321 - LlaMA-4 support _text only_.md diff --git a/github-data/pull_requests/324-Correct L4 rms_norm.md b/github-data/pull_requests/324 - Correct L4 rms_norm.md similarity index 100% rename from github-data/pull_requests/324-Correct L4 rms_norm.md rename to github-data/pull_requests/324 - Correct L4 rms_norm.md diff --git a/github-data/pull_requests/325-Fix KLD precision.md b/github-data/pull_requests/325 - Fix KLD precision.md similarity index 100% rename from github-data/pull_requests/325-Fix KLD precision.md rename to github-data/pull_requests/325 - Fix KLD precision.md diff --git a/github-data/pull_requests/326-WIP Compute per layer LIM Scores during imatrix.md b/github-data/pull_requests/326 - WIP Compute per layer LIM Scores during imatrix.md similarity index 100% rename from github-data/pull_requests/326-WIP Compute per layer LIM Scores during imatrix.md rename to github-data/pull_requests/326 - WIP Compute per layer LIM Scores during imatrix.md diff --git a/github-data/pull_requests/327-Improved IQ1_M quantization.md b/github-data/pull_requests/327 - Improved IQ1_M quantization.md similarity index 100% rename from github-data/pull_requests/327-Improved IQ1_M quantization.md rename to github-data/pull_requests/327 - Improved IQ1_M quantization.md diff --git a/github-data/pull_requests/328-imatrix_ collect layer influence statistics.md b/github-data/pull_requests/328 - imatrix_ collect layer influence statistics.md similarity index 100% rename from github-data/pull_requests/328-imatrix_ collect layer influence statistics.md rename to github-data/pull_requests/328 - imatrix_ collect layer influence statistics.md diff --git a/github-data/pull_requests/329-Add ability to hide imatrix details in llama-quantize.md b/github-data/pull_requests/329 - Add ability to hide imatrix details in llama-quantize.md similarity index 100% rename from github-data/pull_requests/329-Add ability to hide imatrix details in llama-quantize.md rename to github-data/pull_requests/329 - Add ability to hide imatrix details in llama-quantize.md diff --git a/github-data/pull_requests/33-Do not process prompts containing binary data for escapes.md b/github-data/pull_requests/33 - Do not process prompts containing binary data for escapes.md similarity index 100% rename from github-data/pull_requests/33-Do not process prompts containing binary data for escapes.md rename to github-data/pull_requests/33 - Do not process prompts containing binary data for escapes.md diff --git a/github-data/pull_requests/330-Allow q8_0 KV cache for head size 256.md b/github-data/pull_requests/330 - Allow q8_0 KV cache for head size 256.md similarity index 100% rename from github-data/pull_requests/330-Allow q8_0 KV cache for head size 256.md rename to github-data/pull_requests/330 - Allow q8_0 KV cache for head size 256.md diff --git a/github-data/pull_requests/331-Better gemm_gemv on AVX2 fr q4_0_r8.md b/github-data/pull_requests/331 - Better gemm_gemv on AVX2 fr q4_0_r8.md similarity index 100% rename from github-data/pull_requests/331-Better gemm_gemv on AVX2 fr q4_0_r8.md rename to github-data/pull_requests/331 - Better gemm_gemv on AVX2 fr q4_0_r8.md diff --git a/github-data/pull_requests/332-Better TG performance for GQA models (CPU).md b/github-data/pull_requests/332 - Better TG performance for GQA models _CPU_.md similarity index 100% rename from github-data/pull_requests/332-Better TG performance for GQA models (CPU).md rename to github-data/pull_requests/332 - Better TG performance for GQA models _CPU_.md diff --git a/github-data/pull_requests/333-Support GLM-4-0414 models based on piDack's mainline PR.md b/github-data/pull_requests/333 - Support GLM-4-0414 models based on piDack_s mainline PR.md similarity index 100% rename from github-data/pull_requests/333-Support GLM-4-0414 models based on piDack's mainline PR.md rename to github-data/pull_requests/333 - Support GLM-4-0414 models based on piDack_s mainline PR.md diff --git a/github-data/pull_requests/336-Fix termux_android build.md b/github-data/pull_requests/336 - Fix termux_android build.md similarity index 100% rename from github-data/pull_requests/336-Fix termux_android build.md rename to github-data/pull_requests/336 - Fix termux_android build.md diff --git a/github-data/pull_requests/337-Add support for bitnet2b_2501 model.md b/github-data/pull_requests/337 - Add support for bitnet2b_2501 model.md similarity index 100% rename from github-data/pull_requests/337-Add support for bitnet2b_2501 model.md rename to github-data/pull_requests/337 - Add support for bitnet2b_2501 model.md diff --git a/github-data/pull_requests/338-BitNet adjustments.md b/github-data/pull_requests/338 - BitNet adjustments.md similarity index 100% rename from github-data/pull_requests/338-BitNet adjustments.md rename to github-data/pull_requests/338 - BitNet adjustments.md diff --git a/github-data/pull_requests/341-Add support for Cohere2.md b/github-data/pull_requests/341 - Add support for Cohere2.md similarity index 100% rename from github-data/pull_requests/341-Add support for Cohere2.md rename to github-data/pull_requests/341 - Add support for Cohere2.md diff --git a/github-data/pull_requests/342-Fix LLaMA-4 attention.md b/github-data/pull_requests/342 - Fix LLaMA-4 attention.md similarity index 100% rename from github-data/pull_requests/342-Fix LLaMA-4 attention.md rename to github-data/pull_requests/342 - Fix LLaMA-4 attention.md diff --git a/github-data/pull_requests/343-cuda_ use switch in constexpr funcs.md b/github-data/pull_requests/343 - cuda_ use switch in constexpr funcs.md similarity index 100% rename from github-data/pull_requests/343-cuda_ use switch in constexpr funcs.md rename to github-data/pull_requests/343 - cuda_ use switch in constexpr funcs.md diff --git a/github-data/pull_requests/344-Add GLM-4-0414 Model Support.md b/github-data/pull_requests/344 - Add GLM-4-0414 Model Support.md similarity index 100% rename from github-data/pull_requests/344-Add GLM-4-0414 Model Support.md rename to github-data/pull_requests/344 - Add GLM-4-0414 Model Support.md diff --git a/github-data/pull_requests/346-Fix FA on ARM CPUs.md b/github-data/pull_requests/346 - Fix FA on ARM CPUs.md similarity index 100% rename from github-data/pull_requests/346-Fix FA on ARM CPUs.md rename to github-data/pull_requests/346 - Fix FA on ARM CPUs.md diff --git a/github-data/pull_requests/347-Add ability to manually set arch flags.md b/github-data/pull_requests/347 - Add ability to manually set arch flags.md similarity index 100% rename from github-data/pull_requests/347-Add ability to manually set arch flags.md rename to github-data/pull_requests/347 - Add ability to manually set arch flags.md diff --git a/github-data/pull_requests/348-Fix q4_1 and q5_1 on Arm.md b/github-data/pull_requests/348 - Fix q4_1 and q5_1 on Arm.md similarity index 100% rename from github-data/pull_requests/348-Fix q4_1 and q5_1 on Arm.md rename to github-data/pull_requests/348 - Fix q4_1 and q5_1 on Arm.md diff --git a/github-data/pull_requests/349-Fix division by zero bug.md b/github-data/pull_requests/349 - Fix division by zero bug.md similarity index 100% rename from github-data/pull_requests/349-Fix division by zero bug.md rename to github-data/pull_requests/349 - Fix division by zero bug.md diff --git a/github-data/pull_requests/35-Fix Zen4 Flash Attention.md b/github-data/pull_requests/35 - Fix Zen4 Flash Attention.md similarity index 100% rename from github-data/pull_requests/35-Fix Zen4 Flash Attention.md rename to github-data/pull_requests/35 - Fix Zen4 Flash Attention.md diff --git a/github-data/pull_requests/351-CPU FA improvements .md b/github-data/pull_requests/351 - CPU FA improvements.md similarity index 100% rename from github-data/pull_requests/351-CPU FA improvements .md rename to github-data/pull_requests/351 - CPU FA improvements.md diff --git a/github-data/pull_requests/352-Update README.md.md b/github-data/pull_requests/352 - Update README.md.md similarity index 100% rename from github-data/pull_requests/352-Update README.md.md rename to github-data/pull_requests/352 - Update README.md.md diff --git a/github-data/pull_requests/355-Apply Qwen3 PR from llama.cpp.md b/github-data/pull_requests/355 - Apply Qwen3 PR from llama.cpp.md similarity index 100% rename from github-data/pull_requests/355-Apply Qwen3 PR from llama.cpp.md rename to github-data/pull_requests/355 - Apply Qwen3 PR from llama.cpp.md diff --git a/github-data/pull_requests/356-Add missing enum values for qwen3 and qwen3moe.md b/github-data/pull_requests/356 - Add missing enum values for qwen3 and qwen3moe.md similarity index 100% rename from github-data/pull_requests/356-Add missing enum values for qwen3 and qwen3moe.md rename to github-data/pull_requests/356 - Add missing enum values for qwen3 and qwen3moe.md diff --git a/github-data/pull_requests/36-Zen4 Flash Attnetion 2.md b/github-data/pull_requests/36 - Zen4 Flash Attnetion 2.md similarity index 100% rename from github-data/pull_requests/36-Zen4 Flash Attnetion 2.md rename to github-data/pull_requests/36 - Zen4 Flash Attnetion 2.md diff --git a/github-data/pull_requests/360-Fix IQK_FA_ALL_QUANTS on AVX2.md b/github-data/pull_requests/360 - Fix IQK_FA_ALL_QUANTS on AVX2.md similarity index 100% rename from github-data/pull_requests/360-Fix IQK_FA_ALL_QUANTS on AVX2.md rename to github-data/pull_requests/360 - Fix IQK_FA_ALL_QUANTS on AVX2.md diff --git a/github-data/pull_requests/364-Fix FA bug on AVX2.md b/github-data/pull_requests/364 - Fix FA bug on AVX2.md similarity index 100% rename from github-data/pull_requests/364-Fix FA bug on AVX2.md rename to github-data/pull_requests/364 - Fix FA bug on AVX2.md diff --git a/github-data/pull_requests/366-Add support for new Bitnet model architecture name.md b/github-data/pull_requests/366 - Add support for new Bitnet model architecture name.md similarity index 100% rename from github-data/pull_requests/366-Add support for new Bitnet model architecture name.md rename to github-data/pull_requests/366 - Add support for new Bitnet model architecture name.md diff --git a/github-data/pull_requests/368-Trying to fix iq1_s_r4_iq1_m_r4 quantization failure.md b/github-data/pull_requests/368 - Trying to fix iq1_s_r4_iq1_m_r4 quantization failure.md similarity index 100% rename from github-data/pull_requests/368-Trying to fix iq1_s_r4_iq1_m_r4 quantization failure.md rename to github-data/pull_requests/368 - Trying to fix iq1_s_r4_iq1_m_r4 quantization failure.md diff --git a/github-data/pull_requests/369-cmake_ force MSVC compiler charset to utf-8.md b/github-data/pull_requests/369 - cmake_ force MSVC compiler charset to utf-8.md similarity index 100% rename from github-data/pull_requests/369-cmake_ force MSVC compiler charset to utf-8.md rename to github-data/pull_requests/369 - cmake_ force MSVC compiler charset to utf-8.md diff --git a/github-data/pull_requests/37-Performance improvements for legacy quants on ARM_NEON.md b/github-data/pull_requests/37 - Performance improvements for legacy quants on ARM_NEON.md similarity index 100% rename from github-data/pull_requests/37-Performance improvements for legacy quants on ARM_NEON.md rename to github-data/pull_requests/37 - Performance improvements for legacy quants on ARM_NEON.md diff --git a/github-data/pull_requests/370-CUDA_ faster FA TG for GQA models .md b/github-data/pull_requests/370 - CUDA_ faster FA TG for GQA models.md similarity index 100% rename from github-data/pull_requests/370-CUDA_ faster FA TG for GQA models .md rename to github-data/pull_requests/370 - CUDA_ faster FA TG for GQA models.md diff --git a/github-data/pull_requests/371-Another attempt to fix #367.md b/github-data/pull_requests/371 - Another attempt to fix _367.md similarity index 100% rename from github-data/pull_requests/371-Another attempt to fix #367.md rename to github-data/pull_requests/371 - Another attempt to fix _367.md diff --git a/github-data/pull_requests/374-CUDA_ MMQ for IQ4_KS.md b/github-data/pull_requests/374 - CUDA_ MMQ for IQ4_KS.md similarity index 100% rename from github-data/pull_requests/374-CUDA_ MMQ for IQ4_KS.md rename to github-data/pull_requests/374 - CUDA_ MMQ for IQ4_KS.md diff --git a/github-data/pull_requests/375-Add batch warmup to sweep-bench.md b/github-data/pull_requests/375 - Add batch warmup to sweep-bench.md similarity index 100% rename from github-data/pull_requests/375-Add batch warmup to sweep-bench.md rename to github-data/pull_requests/375 - Add batch warmup to sweep-bench.md diff --git a/github-data/pull_requests/377-Support for Llama-3-Nemotron models.md b/github-data/pull_requests/377 - Support for Llama-3-Nemotron models.md similarity index 100% rename from github-data/pull_requests/377-Support for Llama-3-Nemotron models.md rename to github-data/pull_requests/377 - Support for Llama-3-Nemotron models.md diff --git a/github-data/pull_requests/38-Zen4 Flash Attention - bf16 support.md b/github-data/pull_requests/38 - Zen4 Flash Attention - bf16 support.md similarity index 100% rename from github-data/pull_requests/38-Zen4 Flash Attention - bf16 support.md rename to github-data/pull_requests/38 - Zen4 Flash Attention - bf16 support.md diff --git a/github-data/pull_requests/382-Fix DeepSeek FA.md b/github-data/pull_requests/382 - Fix DeepSeek FA.md similarity index 100% rename from github-data/pull_requests/382-Fix DeepSeek FA.md rename to github-data/pull_requests/382 - Fix DeepSeek FA.md diff --git a/github-data/pull_requests/386-FlashMLA-3 for DeepSeek models on CUDA.md b/github-data/pull_requests/386 - FlashMLA-3 for DeepSeek models on CUDA.md similarity index 100% rename from github-data/pull_requests/386-FlashMLA-3 for DeepSeek models on CUDA.md rename to github-data/pull_requests/386 - FlashMLA-3 for DeepSeek models on CUDA.md diff --git a/github-data/pull_requests/39-Add support for bf16 to iqk_mul_mat.md b/github-data/pull_requests/39 - Add support for bf16 to iqk_mul_mat.md similarity index 100% rename from github-data/pull_requests/39-Add support for bf16 to iqk_mul_mat.md rename to github-data/pull_requests/39 - Add support for bf16 to iqk_mul_mat.md diff --git a/github-data/pull_requests/390-Fix build for Xeon Gold 6226R.md b/github-data/pull_requests/390 - Fix build for Xeon Gold 6226R.md similarity index 100% rename from github-data/pull_requests/390-Fix build for Xeon Gold 6226R.md rename to github-data/pull_requests/390 - Fix build for Xeon Gold 6226R.md diff --git a/github-data/pull_requests/391-Fix DeepSeek q8_0 cache.md b/github-data/pull_requests/391 - Fix DeepSeek q8_0 cache.md similarity index 100% rename from github-data/pull_requests/391-Fix DeepSeek q8_0 cache.md rename to github-data/pull_requests/391 - Fix DeepSeek q8_0 cache.md diff --git a/github-data/pull_requests/392-fix some MSVC build problem..md b/github-data/pull_requests/392 - fix some MSVC build problem..md similarity index 100% rename from github-data/pull_requests/392-fix some MSVC build problem..md rename to github-data/pull_requests/392 - fix some MSVC build problem..md diff --git a/github-data/pull_requests/394-Handle incompatible DeepSeek GGUFs.md b/github-data/pull_requests/394 - Handle incompatible DeepSeek GGUFs.md similarity index 100% rename from github-data/pull_requests/394-Handle incompatible DeepSeek GGUFs.md rename to github-data/pull_requests/394 - Handle incompatible DeepSeek GGUFs.md diff --git a/github-data/pull_requests/4-Simdify and multi-thread tanh.md b/github-data/pull_requests/4 - Simdify and multi-thread tanh.md similarity index 100% rename from github-data/pull_requests/4-Simdify and multi-thread tanh.md rename to github-data/pull_requests/4 - Simdify and multi-thread tanh.md diff --git a/github-data/pull_requests/40-Adding bf16 support to CUDA.md b/github-data/pull_requests/40 - Adding bf16 support to CUDA.md similarity index 100% rename from github-data/pull_requests/40-Adding bf16 support to CUDA.md rename to github-data/pull_requests/40 - Adding bf16 support to CUDA.md diff --git a/github-data/pull_requests/400-Fix CUDA DeepSeek FlashMLA-3 with quantized KV cache.md b/github-data/pull_requests/400 - Fix CUDA DeepSeek FlashMLA-3 with quantized KV cache.md similarity index 100% rename from github-data/pull_requests/400-Fix CUDA DeepSeek FlashMLA-3 with quantized KV cache.md rename to github-data/pull_requests/400 - Fix CUDA DeepSeek FlashMLA-3 with quantized KV cache.md diff --git a/github-data/pull_requests/402-Fix missing rope_freqs with convert_hf_to_gguf.md b/github-data/pull_requests/402 - Fix missing rope_freqs with convert_hf_to_gguf.md similarity index 100% rename from github-data/pull_requests/402-Fix missing rope_freqs with convert_hf_to_gguf.md rename to github-data/pull_requests/402 - Fix missing rope_freqs with convert_hf_to_gguf.md diff --git a/github-data/pull_requests/404-TG improvements for MoE models.md b/github-data/pull_requests/404 - TG improvements for MoE models.md similarity index 100% rename from github-data/pull_requests/404-TG improvements for MoE models.md rename to github-data/pull_requests/404 - TG improvements for MoE models.md diff --git a/github-data/pull_requests/405-GPU offload policy.md b/github-data/pull_requests/405 - GPU offload policy.md similarity index 100% rename from github-data/pull_requests/405-GPU offload policy.md rename to github-data/pull_requests/405 - GPU offload policy.md diff --git a/github-data/pull_requests/406-Fix race in the CUDA DeepSeek FA kernel.md b/github-data/pull_requests/406 - Fix race in the CUDA DeepSeek FA kernel.md similarity index 100% rename from github-data/pull_requests/406-Fix race in the CUDA DeepSeek FA kernel.md rename to github-data/pull_requests/406 - Fix race in the CUDA DeepSeek FA kernel.md diff --git a/github-data/pull_requests/408-Faster DeepSeek FA on CUDA.md b/github-data/pull_requests/408 - Faster DeepSeek FA on CUDA.md similarity index 100% rename from github-data/pull_requests/408-Faster DeepSeek FA on CUDA.md rename to github-data/pull_requests/408 - Faster DeepSeek FA on CUDA.md diff --git a/github-data/pull_requests/409-Enable faster prompt processing with mainline llama.cpp GGUFs.md b/github-data/pull_requests/409 - Enable faster prompt processing with mainline llama.cpp GGUFs.md similarity index 100% rename from github-data/pull_requests/409-Enable faster prompt processing with mainline llama.cpp GGUFs.md rename to github-data/pull_requests/409 - Enable faster prompt processing with mainline llama.cpp GGUFs.md diff --git a/github-data/pull_requests/41-iqk_mul_mat(ARM_NEON)_ adding bf16 support.md b/github-data/pull_requests/41 - iqk_mul_mat_ARM_NEON_ adding bf16 support.md similarity index 100% rename from github-data/pull_requests/41-iqk_mul_mat(ARM_NEON)_ adding bf16 support.md rename to github-data/pull_requests/41 - iqk_mul_mat_ARM_NEON_ adding bf16 support.md diff --git a/github-data/pull_requests/410-Better CPU FA performance for DeepSeek-Lite.md b/github-data/pull_requests/410 - Better CPU FA performance for DeepSeek-Lite.md similarity index 100% rename from github-data/pull_requests/410-Better CPU FA performance for DeepSeek-Lite.md rename to github-data/pull_requests/410 - Better CPU FA performance for DeepSeek-Lite.md diff --git a/github-data/pull_requests/411-Fix imatrix calculation for MLA models.md b/github-data/pull_requests/411 - Fix imatrix calculation for MLA models.md similarity index 100% rename from github-data/pull_requests/411-Fix imatrix calculation for MLA models.md rename to github-data/pull_requests/411 - Fix imatrix calculation for MLA models.md diff --git a/github-data/pull_requests/413-Fix new CUDA FA on Touring.md b/github-data/pull_requests/413 - Fix new CUDA FA on Touring.md similarity index 100% rename from github-data/pull_requests/413-Fix new CUDA FA on Touring.md rename to github-data/pull_requests/413 - Fix new CUDA FA on Touring.md diff --git a/github-data/pull_requests/415-Fix SER (CPU).md b/github-data/pull_requests/415 - Fix SER _CPU_.md similarity index 100% rename from github-data/pull_requests/415-Fix SER (CPU).md rename to github-data/pull_requests/415 - Fix SER _CPU_.md diff --git a/github-data/pull_requests/416-Fix SER (CUDA).md b/github-data/pull_requests/416 - Fix SER _CUDA_.md similarity index 100% rename from github-data/pull_requests/416-Fix SER (CUDA).md rename to github-data/pull_requests/416 - Fix SER _CUDA_.md diff --git a/github-data/pull_requests/417-CUDA_ quantized GEMM for for IQ4_K, IQ5_K, IQ6_K .md b/github-data/pull_requests/417 - CUDA_ quantized GEMM for for IQ4_K_ IQ5_K_ IQ6_K.md similarity index 100% rename from github-data/pull_requests/417-CUDA_ quantized GEMM for for IQ4_K, IQ5_K, IQ6_K .md rename to github-data/pull_requests/417 - CUDA_ quantized GEMM for for IQ4_K_ IQ5_K_ IQ6_K.md diff --git a/github-data/pull_requests/418-CUDA_ quantized GEMM for for IQ2_KS, IQ2_K, IQ3_K.md b/github-data/pull_requests/418 - CUDA_ quantized GEMM for for IQ2_KS_ IQ2_K_ IQ3_K.md similarity index 100% rename from github-data/pull_requests/418-CUDA_ quantized GEMM for for IQ2_KS, IQ2_K, IQ3_K.md rename to github-data/pull_requests/418 - CUDA_ quantized GEMM for for IQ2_KS_ IQ2_K_ IQ3_K.md diff --git a/github-data/pull_requests/42-Adding fused rms_norm.md b/github-data/pull_requests/42 - Adding fused rms_norm.md similarity index 100% rename from github-data/pull_requests/42-Adding fused rms_norm.md rename to github-data/pull_requests/42 - Adding fused rms_norm.md diff --git a/github-data/pull_requests/421-Fix standard attention on the CPU.md b/github-data/pull_requests/421 - Fix standard attention on the CPU.md similarity index 100% rename from github-data/pull_requests/421-Fix standard attention on the CPU.md rename to github-data/pull_requests/421 - Fix standard attention on the CPU.md diff --git a/github-data/pull_requests/422-Adding IQ5_KS - 5.25 bpw quants.md b/github-data/pull_requests/422 - Adding IQ5_KS - 5.25 bpw quants.md similarity index 100% rename from github-data/pull_requests/422-Adding IQ5_KS - 5.25 bpw quants.md rename to github-data/pull_requests/422 - Adding IQ5_KS - 5.25 bpw quants.md diff --git a/github-data/pull_requests/424-Adding forgotten template instance for iq5_ks.md b/github-data/pull_requests/424 - Adding forgotten template instance for iq5_ks.md similarity index 100% rename from github-data/pull_requests/424-Adding forgotten template instance for iq5_ks.md rename to github-data/pull_requests/424 - Adding forgotten template instance for iq5_ks.md diff --git a/github-data/pull_requests/426-IQ5_KS_R4_ row-interleaved IQ5_KS.md b/github-data/pull_requests/426 - IQ5_KS_R4_ row-interleaved IQ5_KS.md similarity index 100% rename from github-data/pull_requests/426-IQ5_KS_R4_ row-interleaved IQ5_KS.md rename to github-data/pull_requests/426 - IQ5_KS_R4_ row-interleaved IQ5_KS.md diff --git a/github-data/pull_requests/427-Fix AVX2 implementation of IQ4_K, IQ4_KS, IQ5_K, IQ6_K.md b/github-data/pull_requests/427 - Fix AVX2 implementation of IQ4_K_ IQ4_KS_ IQ5_K_ IQ6_K.md similarity index 100% rename from github-data/pull_requests/427-Fix AVX2 implementation of IQ4_K, IQ4_KS, IQ5_K, IQ6_K.md rename to github-data/pull_requests/427 - Fix AVX2 implementation of IQ4_K_ IQ4_KS_ IQ5_K_ IQ6_K.md diff --git a/github-data/pull_requests/428-Zen4_ Faster PP for IQ2_KS, IQ4_KS, IQ5_KS.md b/github-data/pull_requests/428 - Zen4_ Faster PP for IQ2_KS_ IQ4_KS_ IQ5_KS.md similarity index 100% rename from github-data/pull_requests/428-Zen4_ Faster PP for IQ2_KS, IQ4_KS, IQ5_KS.md rename to github-data/pull_requests/428 - Zen4_ Faster PP for IQ2_KS_ IQ4_KS_ IQ5_KS.md diff --git a/github-data/pull_requests/429-Option to enable or disable the CPU FA kernels.md b/github-data/pull_requests/429 - Option to enable or disable the CPU FA kernels.md similarity index 100% rename from github-data/pull_requests/429-Option to enable or disable the CPU FA kernels.md rename to github-data/pull_requests/429 - Option to enable or disable the CPU FA kernels.md diff --git a/github-data/pull_requests/43-iq2_tn_ slightly faster PP on Zen4.md b/github-data/pull_requests/43 - iq2_tn_ slightly faster PP on Zen4.md similarity index 100% rename from github-data/pull_requests/43-iq2_tn_ slightly faster PP on Zen4.md rename to github-data/pull_requests/43 - iq2_tn_ slightly faster PP on Zen4.md diff --git a/github-data/pull_requests/430-Disable multi-add for now.md b/github-data/pull_requests/430 - Disable multi-add for now.md similarity index 100% rename from github-data/pull_requests/430-Disable multi-add for now.md rename to github-data/pull_requests/430 - Disable multi-add for now.md diff --git a/github-data/pull_requests/431-Forgotten MMQ ref and typo.md b/github-data/pull_requests/431 - Forgotten MMQ ref and typo.md similarity index 100% rename from github-data/pull_requests/431-Forgotten MMQ ref and typo.md rename to github-data/pull_requests/431 - Forgotten MMQ ref and typo.md diff --git a/github-data/pull_requests/435-Refactor iqk_mul_mat.cpp.md b/github-data/pull_requests/435 - Refactor iqk_mul_mat.cpp.md similarity index 100% rename from github-data/pull_requests/435-Refactor iqk_mul_mat.cpp.md rename to github-data/pull_requests/435 - Refactor iqk_mul_mat.cpp.md diff --git a/github-data/pull_requests/438-Another attempt to fix the illegal memory access bug.md b/github-data/pull_requests/438 - Another attempt to fix the illegal memory access bug.md similarity index 100% rename from github-data/pull_requests/438-Another attempt to fix the illegal memory access bug.md rename to github-data/pull_requests/438 - Another attempt to fix the illegal memory access bug.md diff --git a/github-data/pull_requests/439-Bug fixes from mainline.md b/github-data/pull_requests/439 - Bug fixes from mainline.md similarity index 100% rename from github-data/pull_requests/439-Bug fixes from mainline.md rename to github-data/pull_requests/439 - Bug fixes from mainline.md diff --git a/github-data/pull_requests/44-Adding IQ1_TN - 1.6875 bpw for TriLM ternary models.md b/github-data/pull_requests/44 - Adding IQ1_TN - 1.6875 bpw for TriLM ternary models.md similarity index 100% rename from github-data/pull_requests/44-Adding IQ1_TN - 1.6875 bpw for TriLM ternary models.md rename to github-data/pull_requests/44 - Adding IQ1_TN - 1.6875 bpw for TriLM ternary models.md diff --git a/github-data/pull_requests/441-Trellis quants with CPU inference.md b/github-data/pull_requests/441 - Trellis quants with CPU inference.md similarity index 100% rename from github-data/pull_requests/441-Trellis quants with CPU inference.md rename to github-data/pull_requests/441 - Trellis quants with CPU inference.md diff --git a/github-data/pull_requests/442-CUDA call tracer.md b/github-data/pull_requests/442 - CUDA call tracer.md similarity index 100% rename from github-data/pull_requests/442-CUDA call tracer.md rename to github-data/pull_requests/442 - CUDA call tracer.md diff --git a/github-data/pull_requests/443-Streamline a bit the quant strategies.md b/github-data/pull_requests/443 - Streamline a bit the quant strategies.md similarity index 100% rename from github-data/pull_requests/443-Streamline a bit the quant strategies.md rename to github-data/pull_requests/443 - Streamline a bit the quant strategies.md diff --git a/github-data/pull_requests/444-gguf-split _ update.md b/github-data/pull_requests/444 - gguf-split _ update.md similarity index 100% rename from github-data/pull_requests/444-gguf-split _ update.md rename to github-data/pull_requests/444 - gguf-split _ update.md diff --git a/github-data/pull_requests/445-Fix typo in non-AVX2 code branch.md b/github-data/pull_requests/445 - Fix typo in non-AVX2 code branch.md similarity index 100% rename from github-data/pull_requests/445-Fix typo in non-AVX2 code branch.md rename to github-data/pull_requests/445 - Fix typo in non-AVX2 code branch.md diff --git a/github-data/pull_requests/446-Fix bug in MMVQ kernel.md b/github-data/pull_requests/446 - Fix bug in MMVQ kernel.md similarity index 100% rename from github-data/pull_requests/446-Fix bug in MMVQ kernel.md rename to github-data/pull_requests/446 - Fix bug in MMVQ kernel.md diff --git a/github-data/pull_requests/448-Fix MSVC compilation.md b/github-data/pull_requests/448 - Fix MSVC compilation.md similarity index 100% rename from github-data/pull_requests/448-Fix MSVC compilation.md rename to github-data/pull_requests/448 - Fix MSVC compilation.md diff --git a/github-data/pull_requests/449-Legacy quants conversion schemes in convert_hf_to_gguf.py.md b/github-data/pull_requests/449 - Legacy quants conversion schemes in convert_hf_to_gguf.py.md similarity index 100% rename from github-data/pull_requests/449-Legacy quants conversion schemes in convert_hf_to_gguf.py.md rename to github-data/pull_requests/449 - Legacy quants conversion schemes in convert_hf_to_gguf.py.md diff --git a/github-data/pull_requests/45-Add CUDA support for IQ1_TN.md b/github-data/pull_requests/45 - Add CUDA support for IQ1_TN.md similarity index 100% rename from github-data/pull_requests/45-Add CUDA support for IQ1_TN.md rename to github-data/pull_requests/45 - Add CUDA support for IQ1_TN.md diff --git a/github-data/pull_requests/453-Faster IQ3_KT and IQ4_KT.md b/github-data/pull_requests/453 - Faster IQ3_KT and IQ4_KT.md similarity index 100% rename from github-data/pull_requests/453-Faster IQ3_KT and IQ4_KT.md rename to github-data/pull_requests/453 - Faster IQ3_KT and IQ4_KT.md diff --git a/github-data/pull_requests/454-Add support for FP8 GGUF creation and re-quantization (WIP).md b/github-data/pull_requests/454 - Add support for FP8 GGUF creation and re-quantization _WIP_.md similarity index 100% rename from github-data/pull_requests/454-Add support for FP8 GGUF creation and re-quantization (WIP).md rename to github-data/pull_requests/454 - Add support for FP8 GGUF creation and re-quantization _WIP_.md diff --git a/github-data/pull_requests/457-Remove GGML_IQK_MUL_MAT option.md b/github-data/pull_requests/457 - Remove GGML_IQK_MUL_MAT option.md similarity index 100% rename from github-data/pull_requests/457-Remove GGML_IQK_MUL_MAT option.md rename to github-data/pull_requests/457 - Remove GGML_IQK_MUL_MAT option.md diff --git a/github-data/pull_requests/458-Add missing gguf-py constants.md b/github-data/pull_requests/458 - Add missing gguf-py constants.md similarity index 100% rename from github-data/pull_requests/458-Add missing gguf-py constants.md rename to github-data/pull_requests/458 - Add missing gguf-py constants.md diff --git a/github-data/pull_requests/46-IQ1_TN Metal implementation.md b/github-data/pull_requests/46 - IQ1_TN Metal implementation.md similarity index 100% rename from github-data/pull_requests/46-IQ1_TN Metal implementation.md rename to github-data/pull_requests/46 - IQ1_TN Metal implementation.md diff --git a/github-data/pull_requests/460-aarch64 kernels for KT quants.md b/github-data/pull_requests/460 - aarch64 kernels for KT quants.md similarity index 100% rename from github-data/pull_requests/460-aarch64 kernels for KT quants.md rename to github-data/pull_requests/460 - aarch64 kernels for KT quants.md diff --git a/github-data/pull_requests/461-CUDA implementation for IQ2_K_R4, IQ3_K_R4, IQ4_K_R4, IQ5_K_R4.md b/github-data/pull_requests/461 - CUDA implementation for IQ2_K_R4_ IQ3_K_R4_ IQ4_K_R4_ IQ5_K_R4.md similarity index 100% rename from github-data/pull_requests/461-CUDA implementation for IQ2_K_R4, IQ3_K_R4, IQ4_K_R4, IQ5_K_R4.md rename to github-data/pull_requests/461 - CUDA implementation for IQ2_K_R4_ IQ3_K_R4_ IQ4_K_R4_ IQ5_K_R4.md diff --git a/github-data/pull_requests/462-CUDA GEMM and GEMV for IQ4_KS_R4 and IQ5_KS_R4.md b/github-data/pull_requests/462 - CUDA GEMM and GEMV for IQ4_KS_R4 and IQ5_KS_R4.md similarity index 100% rename from github-data/pull_requests/462-CUDA GEMM and GEMV for IQ4_KS_R4 and IQ5_KS_R4.md rename to github-data/pull_requests/462 - CUDA GEMM and GEMV for IQ4_KS_R4 and IQ5_KS_R4.md diff --git a/github-data/pull_requests/465-Set cache_prompt default to true.md b/github-data/pull_requests/465 - Set cache_prompt default to true.md similarity index 100% rename from github-data/pull_requests/465-Set cache_prompt default to true.md rename to github-data/pull_requests/465 - Set cache_prompt default to true.md diff --git a/github-data/pull_requests/468-Minor (~2%) iq2_ks TG performance improvement on CUDA.md b/github-data/pull_requests/468 - Minor _2_ iq2_ks TG performance improvement on CUDA.md similarity index 100% rename from github-data/pull_requests/468-Minor (~2%) iq2_ks TG performance improvement on CUDA.md rename to github-data/pull_requests/468 - Minor _2_ iq2_ks TG performance improvement on CUDA.md diff --git a/github-data/pull_requests/469-Replace MLA-specific KV cache with the standard KV cache.md b/github-data/pull_requests/469 - Replace MLA-specific KV cache with the standard KV cache.md similarity index 100% rename from github-data/pull_requests/469-Replace MLA-specific KV cache with the standard KV cache.md rename to github-data/pull_requests/469 - Replace MLA-specific KV cache with the standard KV cache.md diff --git a/github-data/pull_requests/47-iq2_tn_ slightly better performance on AVX2.md b/github-data/pull_requests/47 - iq2_tn_ slightly better performance on AVX2.md similarity index 100% rename from github-data/pull_requests/47-iq2_tn_ slightly better performance on AVX2.md rename to github-data/pull_requests/47 - iq2_tn_ slightly better performance on AVX2.md diff --git a/github-data/pull_requests/470-Send [DONE] for OAI compatibility.md b/github-data/pull_requests/470 - Send _DONE_ for OAI compatibility.md similarity index 100% rename from github-data/pull_requests/470-Send [DONE] for OAI compatibility.md rename to github-data/pull_requests/470 - Send _DONE_ for OAI compatibility.md diff --git a/github-data/pull_requests/471-NEON implementation for trellis quants.md b/github-data/pull_requests/471 - NEON implementation for trellis quants.md similarity index 100% rename from github-data/pull_requests/471-NEON implementation for trellis quants.md rename to github-data/pull_requests/471 - NEON implementation for trellis quants.md diff --git a/github-data/pull_requests/473-Replace MLA-specific KV cache with the standard KV cache V2.md b/github-data/pull_requests/473 - Replace MLA-specific KV cache with the standard KV cache V2.md similarity index 100% rename from github-data/pull_requests/473-Replace MLA-specific KV cache with the standard KV cache V2.md rename to github-data/pull_requests/473 - Replace MLA-specific KV cache with the standard KV cache V2.md diff --git a/github-data/pull_requests/475-Metal implementatio for the trellis quants..md b/github-data/pull_requests/475 - Metal implementatio for the trellis quants..md similarity index 100% rename from github-data/pull_requests/475-Metal implementatio for the trellis quants..md rename to github-data/pull_requests/475 - Metal implementatio for the trellis quants..md diff --git a/github-data/pull_requests/478-forgotten refs and typo.md b/github-data/pull_requests/478 - forgotten refs and typo.md similarity index 100% rename from github-data/pull_requests/478-forgotten refs and typo.md rename to github-data/pull_requests/478 - forgotten refs and typo.md diff --git a/github-data/pull_requests/48-AVX2 Flash Attention.md b/github-data/pull_requests/48 - AVX2 Flash Attention.md similarity index 100% rename from github-data/pull_requests/48-AVX2 Flash Attention.md rename to github-data/pull_requests/48 - AVX2 Flash Attention.md diff --git a/github-data/pull_requests/480-Rpc improvement.md b/github-data/pull_requests/480 - Rpc improvement.md similarity index 100% rename from github-data/pull_requests/480-Rpc improvement.md rename to github-data/pull_requests/480 - Rpc improvement.md diff --git a/github-data/pull_requests/481-Webui improvement.md b/github-data/pull_requests/481 - Webui improvement.md similarity index 100% rename from github-data/pull_requests/481-Webui improvement.md rename to github-data/pull_requests/481 - Webui improvement.md diff --git a/github-data/pull_requests/482-Trellis quants_ faster CPU prompt processing.md b/github-data/pull_requests/482 - Trellis quants_ faster CPU prompt processing.md similarity index 100% rename from github-data/pull_requests/482-Trellis quants_ faster CPU prompt processing.md rename to github-data/pull_requests/482 - Trellis quants_ faster CPU prompt processing.md diff --git a/github-data/pull_requests/483- convert_hf_to_gguf.py _ conversion from hf weights to Q6_0.md b/github-data/pull_requests/483 - convert_hf_to_gguf.py _ conversion from hf weights to Q6_0.md similarity index 100% rename from github-data/pull_requests/483- convert_hf_to_gguf.py _ conversion from hf weights to Q6_0.md rename to github-data/pull_requests/483 - convert_hf_to_gguf.py _ conversion from hf weights to Q6_0.md diff --git a/github-data/pull_requests/484-BF16 Trellis implementation.md b/github-data/pull_requests/484 - BF16 Trellis implementation.md similarity index 100% rename from github-data/pull_requests/484-BF16 Trellis implementation.md rename to github-data/pull_requests/484 - BF16 Trellis implementation.md diff --git a/github-data/pull_requests/486-Adding the XTC sampler.md b/github-data/pull_requests/486 - Adding the XTC sampler.md similarity index 100% rename from github-data/pull_requests/486-Adding the XTC sampler.md rename to github-data/pull_requests/486 - Adding the XTC sampler.md diff --git a/github-data/pull_requests/487-Make sure MMVQ is supported before using it.md b/github-data/pull_requests/487 - Make sure MMVQ is supported before using it.md similarity index 100% rename from github-data/pull_requests/487-Make sure MMVQ is supported before using it.md rename to github-data/pull_requests/487 - Make sure MMVQ is supported before using it.md diff --git a/github-data/pull_requests/488-Faster CPU prompt processing for Trellis quants and MoE models.md b/github-data/pull_requests/488 - Faster CPU prompt processing for Trellis quants and MoE models.md similarity index 100% rename from github-data/pull_requests/488-Faster CPU prompt processing for Trellis quants and MoE models.md rename to github-data/pull_requests/488 - Faster CPU prompt processing for Trellis quants and MoE models.md diff --git a/github-data/pull_requests/489-Adding top-n-sigma sampler.md b/github-data/pull_requests/489 - Adding top-n-sigma sampler.md similarity index 100% rename from github-data/pull_requests/489-Adding top-n-sigma sampler.md rename to github-data/pull_requests/489 - Adding top-n-sigma sampler.md diff --git a/github-data/pull_requests/49-ARM_NEON Flash Attention.md b/github-data/pull_requests/49 - ARM_NEON Flash Attention.md similarity index 100% rename from github-data/pull_requests/49-ARM_NEON Flash Attention.md rename to github-data/pull_requests/49 - ARM_NEON Flash Attention.md diff --git a/github-data/pull_requests/492-CUDA implementation for IQ1_S_R4.md b/github-data/pull_requests/492 - CUDA implementation for IQ1_S_R4.md similarity index 100% rename from github-data/pull_requests/492-CUDA implementation for IQ1_S_R4.md rename to github-data/pull_requests/492 - CUDA implementation for IQ1_S_R4.md diff --git a/github-data/pull_requests/493-MMQ implementation for IQ4_KS_R4 and IQ5_KS_R4.md b/github-data/pull_requests/493 - MMQ implementation for IQ4_KS_R4 and IQ5_KS_R4.md similarity index 100% rename from github-data/pull_requests/493-MMQ implementation for IQ4_KS_R4 and IQ5_KS_R4.md rename to github-data/pull_requests/493 - MMQ implementation for IQ4_KS_R4 and IQ5_KS_R4.md diff --git a/github-data/pull_requests/494-IQ1_M_R4 CUDA implementation.md b/github-data/pull_requests/494 - IQ1_M_R4 CUDA implementation.md similarity index 100% rename from github-data/pull_requests/494-IQ1_M_R4 CUDA implementation.md rename to github-data/pull_requests/494 - IQ1_M_R4 CUDA implementation.md diff --git a/github-data/pull_requests/495-Check if ffn_up and ffn_gate are of the same type before using fmoe.md b/github-data/pull_requests/495 - Check if ffn_up and ffn_gate are of the same type before using fmoe.md similarity index 100% rename from github-data/pull_requests/495-Check if ffn_up and ffn_gate are of the same type before using fmoe.md rename to github-data/pull_requests/495 - Check if ffn_up and ffn_gate are of the same type before using fmoe.md diff --git a/github-data/pull_requests/496-Quick hack_ add the MLA flag to llama_hparams.md b/github-data/pull_requests/496 - Quick hack_ add the MLA flag to llama_hparams.md similarity index 100% rename from github-data/pull_requests/496-Quick hack_ add the MLA flag to llama_hparams.md rename to github-data/pull_requests/496 - Quick hack_ add the MLA flag to llama_hparams.md diff --git a/github-data/pull_requests/497-Make prompt cache saving and restoring MLA aware.md b/github-data/pull_requests/497 - Make prompt cache saving and restoring MLA aware.md similarity index 100% rename from github-data/pull_requests/497-Make prompt cache saving and restoring MLA aware.md rename to github-data/pull_requests/497 - Make prompt cache saving and restoring MLA aware.md diff --git a/github-data/pull_requests/5-Fusing a mat mul op followed by a scale op on the CPU.md b/github-data/pull_requests/5 - Fusing a mat mul op followed by a scale op on the CPU.md similarity index 100% rename from github-data/pull_requests/5-Fusing a mat mul op followed by a scale op on the CPU.md rename to github-data/pull_requests/5 - Fusing a mat mul op followed by a scale op on the CPU.md diff --git a/github-data/pull_requests/50-AVX2 Flash Attention 2.md b/github-data/pull_requests/50 - AVX2 Flash Attention 2.md similarity index 100% rename from github-data/pull_requests/50-AVX2 Flash Attention 2.md rename to github-data/pull_requests/50 - AVX2 Flash Attention 2.md diff --git a/github-data/pull_requests/501-Fix #499.md b/github-data/pull_requests/501 - Fix _499.md similarity index 100% rename from github-data/pull_requests/501-Fix #499.md rename to github-data/pull_requests/501 - Fix _499.md diff --git a/github-data/pull_requests/502-Add an endpoint that lists all the saved prompt caches to server.md b/github-data/pull_requests/502 - Add an endpoint that lists all the saved prompt caches to server.md similarity index 100% rename from github-data/pull_requests/502-Add an endpoint that lists all the saved prompt caches to server.md rename to github-data/pull_requests/502 - Add an endpoint that lists all the saved prompt caches to server.md diff --git a/github-data/pull_requests/504-Add DRY and fix the server to use other new samplers..md b/github-data/pull_requests/504 - Add DRY and fix the server to use other new samplers..md similarity index 100% rename from github-data/pull_requests/504-Add DRY and fix the server to use other new samplers..md rename to github-data/pull_requests/504 - Add DRY and fix the server to use other new samplers..md diff --git a/github-data/pull_requests/505-New IQ4_KT trellis implementation.md b/github-data/pull_requests/505 - New IQ4_KT trellis implementation.md similarity index 100% rename from github-data/pull_requests/505-New IQ4_KT trellis implementation.md rename to github-data/pull_requests/505 - New IQ4_KT trellis implementation.md diff --git a/github-data/pull_requests/506-Fix non rpc build error.md b/github-data/pull_requests/506 - Fix non rpc build error.md similarity index 100% rename from github-data/pull_requests/506-Fix non rpc build error.md rename to github-data/pull_requests/506 - Fix non rpc build error.md diff --git a/github-data/pull_requests/508-Fix Compile error (C2668).md b/github-data/pull_requests/508 - Fix Compile error _C2668_.md similarity index 100% rename from github-data/pull_requests/508-Fix Compile error (C2668).md rename to github-data/pull_requests/508 - Fix Compile error _C2668_.md diff --git a/github-data/pull_requests/509-Docs update.md b/github-data/pull_requests/509 - Docs update.md similarity index 100% rename from github-data/pull_requests/509-Docs update.md rename to github-data/pull_requests/509 - Docs update.md diff --git a/github-data/pull_requests/51-Quantized Flash Attention for all supported CPU platforms.md b/github-data/pull_requests/51 - Quantized Flash Attention for all supported CPU platforms.md similarity index 100% rename from github-data/pull_requests/51-Quantized Flash Attention for all supported CPU platforms.md rename to github-data/pull_requests/51 - Quantized Flash Attention for all supported CPU platforms.md diff --git a/github-data/pull_requests/510-Update News section of readme.md b/github-data/pull_requests/510 - Update News section of readme.md similarity index 100% rename from github-data/pull_requests/510-Update News section of readme.md rename to github-data/pull_requests/510 - Update News section of readme.md diff --git a/github-data/pull_requests/511-New IQ2_KT.md b/github-data/pull_requests/511 - New IQ2_KT.md similarity index 100% rename from github-data/pull_requests/511-New IQ2_KT.md rename to github-data/pull_requests/511 - New IQ2_KT.md diff --git a/github-data/pull_requests/512-Add top n sigma sampler in webui and other webui fix.md b/github-data/pull_requests/512 - Add top n sigma sampler in webui and other webui fix.md similarity index 100% rename from github-data/pull_requests/512-Add top n sigma sampler in webui and other webui fix.md rename to github-data/pull_requests/512 - Add top n sigma sampler in webui and other webui fix.md diff --git a/github-data/pull_requests/513-add dry sampler.md b/github-data/pull_requests/513 - add dry sampler.md similarity index 100% rename from github-data/pull_requests/513-add dry sampler.md rename to github-data/pull_requests/513 - add dry sampler.md diff --git a/github-data/pull_requests/515-IQ2_XXS_ much faster CPU prompt processing.md b/github-data/pull_requests/515 - IQ2_XXS_ much faster CPU prompt processing.md similarity index 100% rename from github-data/pull_requests/515-IQ2_XXS_ much faster CPU prompt processing.md rename to github-data/pull_requests/515 - IQ2_XXS_ much faster CPU prompt processing.md diff --git a/github-data/pull_requests/516-Much faster iq3_xxs GEMM via repacking to q8_0_r8 (AVX2).md b/github-data/pull_requests/516 - Much faster iq3_xxs GEMM via repacking to q8_0_r8 _AVX2_.md similarity index 100% rename from github-data/pull_requests/516-Much faster iq3_xxs GEMM via repacking to q8_0_r8 (AVX2).md rename to github-data/pull_requests/516 - Much faster iq3_xxs GEMM via repacking to q8_0_r8 _AVX2_.md diff --git a/github-data/pull_requests/517-IQ1_S_ much faster CPU prompt processing.md b/github-data/pull_requests/517 - IQ1_S_ much faster CPU prompt processing.md similarity index 100% rename from github-data/pull_requests/517-IQ1_S_ much faster CPU prompt processing.md rename to github-data/pull_requests/517 - IQ1_S_ much faster CPU prompt processing.md diff --git a/github-data/pull_requests/518-IQ3_S_ much faster CPU prompt processing.md b/github-data/pull_requests/518 - IQ3_S_ much faster CPU prompt processing.md similarity index 100% rename from github-data/pull_requests/518-IQ3_S_ much faster CPU prompt processing.md rename to github-data/pull_requests/518 - IQ3_S_ much faster CPU prompt processing.md diff --git a/github-data/pull_requests/52-Fix bug and D _ 128 case for Q8_0 k-cache.md b/github-data/pull_requests/52 - Fix bug and D _ 128 case for Q8_0 k-cache.md similarity index 100% rename from github-data/pull_requests/52-Fix bug and D _ 128 case for Q8_0 k-cache.md rename to github-data/pull_requests/52 - Fix bug and D _ 128 case for Q8_0 k-cache.md diff --git a/github-data/pull_requests/520-Better strategy for GPU offload.md b/github-data/pull_requests/520 - Better strategy for GPU offload.md similarity index 100% rename from github-data/pull_requests/520-Better strategy for GPU offload.md rename to github-data/pull_requests/520 - Better strategy for GPU offload.md diff --git a/github-data/pull_requests/524-Perhaps a slightly better GEMV version for IQ2_XXS, IQ3_XXS, IQ3_S.md b/github-data/pull_requests/524 - Perhaps a slightly better GEMV version for IQ2_XXS_ IQ3_XXS_ IQ3_S.md similarity index 100% rename from github-data/pull_requests/524-Perhaps a slightly better GEMV version for IQ2_XXS, IQ3_XXS, IQ3_S.md rename to github-data/pull_requests/524 - Perhaps a slightly better GEMV version for IQ2_XXS_ IQ3_XXS_ IQ3_S.md diff --git a/github-data/pull_requests/525-Faster CPU prompt processing for Q4_K and Q5_K.md b/github-data/pull_requests/525 - Faster CPU prompt processing for Q4_K and Q5_K.md similarity index 100% rename from github-data/pull_requests/525-Faster CPU prompt processing for Q4_K and Q5_K.md rename to github-data/pull_requests/525 - Faster CPU prompt processing for Q4_K and Q5_K.md diff --git a/github-data/pull_requests/528-Fix bug introduced in #524_#525.md b/github-data/pull_requests/528 - Fix bug introduced in _524_525.md similarity index 100% rename from github-data/pull_requests/528-Fix bug introduced in #524_#525.md rename to github-data/pull_requests/528 - Fix bug introduced in _524_525.md diff --git a/github-data/pull_requests/529-New IQ2_KT, IQ3_KT and IQ4_KT, V2.md b/github-data/pull_requests/529 - New IQ2_KT_ IQ3_KT and IQ4_KT_ V2.md similarity index 100% rename from github-data/pull_requests/529-New IQ2_KT, IQ3_KT and IQ4_KT, V2.md rename to github-data/pull_requests/529 - New IQ2_KT_ IQ3_KT and IQ4_KT_ V2.md diff --git a/github-data/pull_requests/53-Quantization mixes tweaks.md b/github-data/pull_requests/53 - Quantization mixes tweaks.md similarity index 100% rename from github-data/pull_requests/53-Quantization mixes tweaks.md rename to github-data/pull_requests/53 - Quantization mixes tweaks.md diff --git a/github-data/pull_requests/531-Much faster CPU prompt processing (part 1).md b/github-data/pull_requests/531 - Much faster CPU prompt processing _part 1_.md similarity index 100% rename from github-data/pull_requests/531-Much faster CPU prompt processing (part 1).md rename to github-data/pull_requests/531 - Much faster CPU prompt processing _part 1_.md diff --git a/github-data/pull_requests/533-Much faster CPU prompt processing (part 2).md b/github-data/pull_requests/533 - Much faster CPU prompt processing _part 2_.md similarity index 100% rename from github-data/pull_requests/533-Much faster CPU prompt processing (part 2).md rename to github-data/pull_requests/533 - Much faster CPU prompt processing _part 2_.md diff --git a/github-data/pull_requests/534-Much faster CPU prompt processing (part 3).md b/github-data/pull_requests/534 - Much faster CPU prompt processing _part 3_.md similarity index 100% rename from github-data/pull_requests/534-Much faster CPU prompt processing (part 3).md rename to github-data/pull_requests/534 - Much faster CPU prompt processing _part 3_.md diff --git a/github-data/pull_requests/535-Minor readme update.md b/github-data/pull_requests/535 - Minor readme update.md similarity index 100% rename from github-data/pull_requests/535-Minor readme update.md rename to github-data/pull_requests/535 - Minor readme update.md diff --git a/github-data/pull_requests/536-Fix KT Neon _ ARM typo.md b/github-data/pull_requests/536 - Fix KT Neon _ ARM typo.md similarity index 100% rename from github-data/pull_requests/536-Fix KT Neon _ ARM typo.md rename to github-data/pull_requests/536 - Fix KT Neon _ ARM typo.md diff --git a/github-data/pull_requests/537-Update CMakeLists.txt to fix NDEBUG handling.md b/github-data/pull_requests/537 - Update CMakeLists.txt to fix NDEBUG handling.md similarity index 100% rename from github-data/pull_requests/537-Update CMakeLists.txt to fix NDEBUG handling.md rename to github-data/pull_requests/537 - Update CMakeLists.txt to fix NDEBUG handling.md diff --git a/github-data/pull_requests/54-Improve Q4_0 and Q8_0 performance on AVX2_Zen4.md b/github-data/pull_requests/54 - Improve Q4_0 and Q8_0 performance on AVX2_Zen4.md similarity index 100% rename from github-data/pull_requests/54-Improve Q4_0 and Q8_0 performance on AVX2_Zen4.md rename to github-data/pull_requests/54 - Improve Q4_0 and Q8_0 performance on AVX2_Zen4.md diff --git a/github-data/pull_requests/540-Fix missed block_q8_x2 bf16 -_ i16 change.md b/github-data/pull_requests/540 - Fix missed block_q8_x2 bf16 -_ i16 change.md similarity index 100% rename from github-data/pull_requests/540-Fix missed block_q8_x2 bf16 -_ i16 change.md rename to github-data/pull_requests/540 - Fix missed block_q8_x2 bf16 -_ i16 change.md diff --git a/github-data/pull_requests/541-Perhaps slightly faster trellis quants.md b/github-data/pull_requests/541 - Perhaps slightly faster trellis quants.md similarity index 100% rename from github-data/pull_requests/541-Perhaps slightly faster trellis quants.md rename to github-data/pull_requests/541 - Perhaps slightly faster trellis quants.md diff --git a/github-data/pull_requests/542-Fix NEON build.md b/github-data/pull_requests/542 - Fix NEON build.md similarity index 100% rename from github-data/pull_requests/542-Fix NEON build.md rename to github-data/pull_requests/542 - Fix NEON build.md diff --git a/github-data/pull_requests/544-New integer trellis on ARM_NEON .md b/github-data/pull_requests/544 - New integer trellis on ARM_NEON.md similarity index 100% rename from github-data/pull_requests/544-New integer trellis on ARM_NEON .md rename to github-data/pull_requests/544 - New integer trellis on ARM_NEON.md diff --git a/github-data/pull_requests/546-Faster ARM_NEON GEMM implementation for legacy quants.md b/github-data/pull_requests/546 - Faster ARM_NEON GEMM implementation for legacy quants.md similarity index 100% rename from github-data/pull_requests/546-Faster ARM_NEON GEMM implementation for legacy quants.md rename to github-data/pull_requests/546 - Faster ARM_NEON GEMM implementation for legacy quants.md diff --git a/github-data/pull_requests/547-build_ add script to simplify build&test workflow for Android.md b/github-data/pull_requests/547 - build_ add script to simplify build_test workflow for Android.md similarity index 100% rename from github-data/pull_requests/547-build_ add script to simplify build&test workflow for Android.md rename to github-data/pull_requests/547 - build_ add script to simplify build_test workflow for Android.md diff --git a/github-data/pull_requests/549-Much faster prompt processing for IQK quants (ARM_NEON).md b/github-data/pull_requests/549 - Much faster prompt processing for IQK quants _ARM_NEON_.md similarity index 100% rename from github-data/pull_requests/549-Much faster prompt processing for IQK quants (ARM_NEON).md rename to github-data/pull_requests/549 - Much faster prompt processing for IQK quants _ARM_NEON_.md diff --git a/github-data/pull_requests/55-Improve Q5_0 performance on AVX2.md b/github-data/pull_requests/55 - Improve Q5_0 performance on AVX2.md similarity index 100% rename from github-data/pull_requests/55-Improve Q5_0 performance on AVX2.md rename to github-data/pull_requests/55 - Improve Q5_0 performance on AVX2.md diff --git a/github-data/pull_requests/550-Much faster prompt processing for I-quants (ARM_NEON).md b/github-data/pull_requests/550 - Much faster prompt processing for I-quants _ARM_NEON_.md similarity index 100% rename from github-data/pull_requests/550-Much faster prompt processing for I-quants (ARM_NEON).md rename to github-data/pull_requests/550 - Much faster prompt processing for I-quants _ARM_NEON_.md diff --git a/github-data/pull_requests/552-Much faster prompt processing for k-quants (ARM_NEON).md b/github-data/pull_requests/552 - Much faster prompt processing for k-quants _ARM_NEON_.md similarity index 100% rename from github-data/pull_requests/552-Much faster prompt processing for k-quants (ARM_NEON).md rename to github-data/pull_requests/552 - Much faster prompt processing for k-quants _ARM_NEON_.md diff --git a/github-data/pull_requests/553-Much faster prompt processing for IQ1_S and IQ1_M on ARM_NEON.md b/github-data/pull_requests/553 - Much faster prompt processing for IQ1_S and IQ1_M on ARM_NEON.md similarity index 100% rename from github-data/pull_requests/553-Much faster prompt processing for IQ1_S and IQ1_M on ARM_NEON.md rename to github-data/pull_requests/553 - Much faster prompt processing for IQ1_S and IQ1_M on ARM_NEON.md diff --git a/github-data/pull_requests/554-Update README.md to add quickstart section.md b/github-data/pull_requests/554 - Update README.md to add quickstart section.md similarity index 100% rename from github-data/pull_requests/554-Update README.md to add quickstart section.md rename to github-data/pull_requests/554 - Update README.md to add quickstart section.md diff --git a/github-data/pull_requests/555-Add Falcon-Edge support.md b/github-data/pull_requests/555 - Add Falcon-Edge support.md similarity index 100% rename from github-data/pull_requests/555-Add Falcon-Edge support.md rename to github-data/pull_requests/555 - Add Falcon-Edge support.md diff --git a/github-data/pull_requests/557-CUDA_ MMQ for iqX_r4 quants .md b/github-data/pull_requests/557 - CUDA_ MMQ for iqX_r4 quants.md similarity index 100% rename from github-data/pull_requests/557-CUDA_ MMQ for iqX_r4 quants .md rename to github-data/pull_requests/557 - CUDA_ MMQ for iqX_r4 quants.md diff --git a/github-data/pull_requests/558-Add mikupad to ik_llama as an alternative WebUI.md b/github-data/pull_requests/558 - Add mikupad to ik_llama as an alternative WebUI.md similarity index 100% rename from github-data/pull_requests/558-Add mikupad to ik_llama as an alternative WebUI.md rename to github-data/pull_requests/558 - Add mikupad to ik_llama as an alternative WebUI.md diff --git a/github-data/pull_requests/559-Use cuBLAS for large batches and quants with block size 16.md b/github-data/pull_requests/559 - Use cuBLAS for large batches and quants with block size 16.md similarity index 100% rename from github-data/pull_requests/559-Use cuBLAS for large batches and quants with block size 16.md rename to github-data/pull_requests/559 - Use cuBLAS for large batches and quants with block size 16.md diff --git a/github-data/pull_requests/56-BF16 support on Metal.md b/github-data/pull_requests/56 - BF16 support on Metal.md similarity index 100% rename from github-data/pull_requests/56-BF16 support on Metal.md rename to github-data/pull_requests/56 - BF16 support on Metal.md diff --git a/github-data/pull_requests/560-Remove what appears to be unnecessary asserts in ggml_cuda_cpy.md b/github-data/pull_requests/560 - Remove what appears to be unnecessary asserts in ggml_cuda_cpy.md similarity index 100% rename from github-data/pull_requests/560-Remove what appears to be unnecessary asserts in ggml_cuda_cpy.md rename to github-data/pull_requests/560 - Remove what appears to be unnecessary asserts in ggml_cuda_cpy.md diff --git a/github-data/pull_requests/563-Merge vulkan code from mainline up to commit of 6_28_2025.md b/github-data/pull_requests/563 - Merge vulkan code from mainline up to commit of 6_28_2025.md similarity index 100% rename from github-data/pull_requests/563-Merge vulkan code from mainline up to commit of 6_28_2025.md rename to github-data/pull_requests/563 - Merge vulkan code from mainline up to commit of 6_28_2025.md diff --git a/github-data/pull_requests/565-add hunyuan moe support for 561.md b/github-data/pull_requests/565 - add hunyuan moe support for 561.md similarity index 100% rename from github-data/pull_requests/565-add hunyuan moe support for 561.md rename to github-data/pull_requests/565 - add hunyuan moe support for 561.md diff --git a/github-data/pull_requests/566-Adding IQ3_KS quants.md b/github-data/pull_requests/566 - Adding IQ3_KS quants.md similarity index 100% rename from github-data/pull_requests/566-Adding IQ3_KS quants.md rename to github-data/pull_requests/566 - Adding IQ3_KS quants.md diff --git a/github-data/pull_requests/567-Minor CUDA PP speed improvement.md b/github-data/pull_requests/567 - Minor CUDA PP speed improvement.md similarity index 100% rename from github-data/pull_requests/567-Minor CUDA PP speed improvement.md rename to github-data/pull_requests/567 - Minor CUDA PP speed improvement.md diff --git a/github-data/pull_requests/569-Conditionally disable fused ops when building with Vulkan enabled.md b/github-data/pull_requests/569 - Conditionally disable fused ops when building with Vulkan enabled.md similarity index 100% rename from github-data/pull_requests/569-Conditionally disable fused ops when building with Vulkan enabled.md rename to github-data/pull_requests/569 - Conditionally disable fused ops when building with Vulkan enabled.md diff --git a/github-data/pull_requests/57-AVX2_Zen4 horizontal sums .md b/github-data/pull_requests/57 - AVX2_Zen4 horizontal sums.md similarity index 100% rename from github-data/pull_requests/57-AVX2_Zen4 horizontal sums .md rename to github-data/pull_requests/57 - AVX2_Zen4 horizontal sums.md diff --git a/github-data/pull_requests/570-Remove duplicate_misplaced cmake find_package for Vulkan.md b/github-data/pull_requests/570 - Remove duplicate_misplaced cmake find_package for Vulkan.md similarity index 100% rename from github-data/pull_requests/570-Remove duplicate_misplaced cmake find_package for Vulkan.md rename to github-data/pull_requests/570 - Remove duplicate_misplaced cmake find_package for Vulkan.md diff --git a/github-data/pull_requests/571-Fix CMakeLists.md b/github-data/pull_requests/571 - Fix CMakeLists.md similarity index 100% rename from github-data/pull_requests/571-Fix CMakeLists.md rename to github-data/pull_requests/571 - Fix CMakeLists.md diff --git a/github-data/pull_requests/573-Support for dots.llm1 models.md b/github-data/pull_requests/573 - Support for dots.llm1 models.md similarity index 100% rename from github-data/pull_requests/573-Support for dots.llm1 models.md rename to github-data/pull_requests/573 - Support for dots.llm1 models.md diff --git a/github-data/pull_requests/574-Change KQ mask padding to 64.md b/github-data/pull_requests/574 - Change KQ mask padding to 64.md similarity index 100% rename from github-data/pull_requests/574-Change KQ mask padding to 64.md rename to github-data/pull_requests/574 - Change KQ mask padding to 64.md diff --git a/github-data/pull_requests/577-Vulkan_ fused rms norm.md b/github-data/pull_requests/577 - Vulkan_ fused rms norm.md similarity index 100% rename from github-data/pull_requests/577-Vulkan_ fused rms norm.md rename to github-data/pull_requests/577 - Vulkan_ fused rms norm.md diff --git a/github-data/pull_requests/578-Do not crash when there is no DRY sampler.md b/github-data/pull_requests/578 - Do not crash when there is no DRY sampler.md similarity index 100% rename from github-data/pull_requests/578-Do not crash when there is no DRY sampler.md rename to github-data/pull_requests/578 - Do not crash when there is no DRY sampler.md diff --git a/github-data/pull_requests/579-Fix debug build failure with RPC off.md b/github-data/pull_requests/579 - Fix debug build failure with RPC off.md similarity index 100% rename from github-data/pull_requests/579-Fix debug build failure with RPC off.md rename to github-data/pull_requests/579 - Fix debug build failure with RPC off.md diff --git a/github-data/pull_requests/58-Fix compiler warnings.md b/github-data/pull_requests/58 - Fix compiler warnings.md similarity index 100% rename from github-data/pull_requests/58-Fix compiler warnings.md rename to github-data/pull_requests/58 - Fix compiler warnings.md diff --git a/github-data/pull_requests/580-Vulkan_ add GGML_OP_FUSED_MUL_UNARY.md b/github-data/pull_requests/580 - Vulkan_ add GGML_OP_FUSED_MUL_UNARY.md similarity index 100% rename from github-data/pull_requests/580-Vulkan_ add GGML_OP_FUSED_MUL_UNARY.md rename to github-data/pull_requests/580 - Vulkan_ add GGML_OP_FUSED_MUL_UNARY.md diff --git a/github-data/pull_requests/581-Vulkan_ Disable multi-add for now.md b/github-data/pull_requests/581 - Vulkan_ Disable multi-add for now.md similarity index 100% rename from github-data/pull_requests/581-Vulkan_ Disable multi-add for now.md rename to github-data/pull_requests/581 - Vulkan_ Disable multi-add for now.md diff --git a/github-data/pull_requests/582-Vulkan_ adding GGML_OP_MULTI_ADD implementation.md b/github-data/pull_requests/582 - Vulkan_ adding GGML_OP_MULTI_ADD implementation.md similarity index 100% rename from github-data/pull_requests/582-Vulkan_ adding GGML_OP_MULTI_ADD implementation.md rename to github-data/pull_requests/582 - Vulkan_ adding GGML_OP_MULTI_ADD implementation.md diff --git a/github-data/pull_requests/583-Adding forgotten file.md b/github-data/pull_requests/583 - Adding forgotten file.md similarity index 100% rename from github-data/pull_requests/583-Adding forgotten file.md rename to github-data/pull_requests/583 - Adding forgotten file.md diff --git a/github-data/pull_requests/584-Vulkan_ flash attention for DeepSeek models.md b/github-data/pull_requests/584 - Vulkan_ flash attention for DeepSeek models.md similarity index 100% rename from github-data/pull_requests/584-Vulkan_ flash attention for DeepSeek models.md rename to github-data/pull_requests/584 - Vulkan_ flash attention for DeepSeek models.md diff --git a/github-data/pull_requests/585-Special handling of Seed Coder FIM tokens.md b/github-data/pull_requests/585 - Special handling of Seed Coder FIM tokens.md similarity index 100% rename from github-data/pull_requests/585-Special handling of Seed Coder FIM tokens.md rename to github-data/pull_requests/585 - Special handling of Seed Coder FIM tokens.md diff --git a/github-data/pull_requests/587-Fix crash when there is no DRY sampler.md b/github-data/pull_requests/587 - Fix crash when there is no DRY sampler.md similarity index 100% rename from github-data/pull_requests/587-Fix crash when there is no DRY sampler.md rename to github-data/pull_requests/587 - Fix crash when there is no DRY sampler.md diff --git a/github-data/pull_requests/588-Fix server crash when there is no DRY sampler.md b/github-data/pull_requests/588 - Fix server crash when there is no DRY sampler.md similarity index 100% rename from github-data/pull_requests/588-Fix server crash when there is no DRY sampler.md rename to github-data/pull_requests/588 - Fix server crash when there is no DRY sampler.md diff --git a/github-data/pull_requests/589-CUDA_ small PP performance improvement for MoE models.md b/github-data/pull_requests/589 - CUDA_ small PP performance improvement for MoE models.md similarity index 100% rename from github-data/pull_requests/589-CUDA_ small PP performance improvement for MoE models.md rename to github-data/pull_requests/589 - CUDA_ small PP performance improvement for MoE models.md diff --git a/github-data/pull_requests/592-Another minor readme update.md b/github-data/pull_requests/592 - Another minor readme update.md similarity index 100% rename from github-data/pull_requests/592-Another minor readme update.md rename to github-data/pull_requests/592 - Another minor readme update.md diff --git a/github-data/pull_requests/593-Faster prompt processing for IQ2_KS, IQ2_K, IQ2_K_R4.md b/github-data/pull_requests/593 - Faster prompt processing for IQ2_KS_ IQ2_K_ IQ2_K_R4.md similarity index 100% rename from github-data/pull_requests/593-Faster prompt processing for IQ2_KS, IQ2_K, IQ2_K_R4.md rename to github-data/pull_requests/593 - Faster prompt processing for IQ2_KS_ IQ2_K_ IQ2_K_R4.md diff --git a/github-data/pull_requests/595-CUDA_ Faster prompt processing for several quantization types .md b/github-data/pull_requests/595 - CUDA_ Faster prompt processing for several quantization types.md similarity index 100% rename from github-data/pull_requests/595-CUDA_ Faster prompt processing for several quantization types .md rename to github-data/pull_requests/595 - CUDA_ Faster prompt processing for several quantization types.md diff --git a/github-data/pull_requests/598-Vulkan_ iquants and flash attention split_k_reduce improvement.md b/github-data/pull_requests/598 - Vulkan_ iquants and flash attention split_k_reduce improvement.md similarity index 100% rename from github-data/pull_requests/598-Vulkan_ iquants and flash attention split_k_reduce improvement.md rename to github-data/pull_requests/598 - Vulkan_ iquants and flash attention split_k_reduce improvement.md diff --git a/github-data/pull_requests/6-IQ4_K_ SOTA 4-bit quantization.md b/github-data/pull_requests/6 - IQ4_K_ SOTA 4-bit quantization.md similarity index 100% rename from github-data/pull_requests/6-IQ4_K_ SOTA 4-bit quantization.md rename to github-data/pull_requests/6 - IQ4_K_ SOTA 4-bit quantization.md diff --git a/github-data/pull_requests/602-Adding IQ2_KL.md b/github-data/pull_requests/602 - Adding IQ2_KL.md similarity index 100% rename from github-data/pull_requests/602-Adding IQ2_KL.md rename to github-data/pull_requests/602 - Adding IQ2_KL.md diff --git a/github-data/pull_requests/603-Check if MMQ should be used before using it.md b/github-data/pull_requests/603 - Check if MMQ should be used before using it.md similarity index 100% rename from github-data/pull_requests/603-Check if MMQ should be used before using it.md rename to github-data/pull_requests/603 - Check if MMQ should be used before using it.md diff --git a/github-data/pull_requests/604-Fix attn_v conditionality when quantizing..md b/github-data/pull_requests/604 - Fix attn_v conditionality when quantizing..md similarity index 100% rename from github-data/pull_requests/604-Fix attn_v conditionality when quantizing..md rename to github-data/pull_requests/604 - Fix attn_v conditionality when quantizing..md diff --git a/github-data/pull_requests/606-Add iq3_ks to constants.py.md b/github-data/pull_requests/606 - Add iq3_ks to constants.py.md similarity index 100% rename from github-data/pull_requests/606-Add iq3_ks to constants.py.md rename to github-data/pull_requests/606 - Add iq3_ks to constants.py.md diff --git a/github-data/pull_requests/607-vulkan_ support softmax_FA batch and broadcast.md b/github-data/pull_requests/607 - vulkan_ support softmax_FA batch and broadcast.md similarity index 100% rename from github-data/pull_requests/607-vulkan_ support softmax_FA batch and broadcast.md rename to github-data/pull_requests/607 - vulkan_ support softmax_FA batch and broadcast.md diff --git a/github-data/pull_requests/608-Vulkan_ a fresh start.md b/github-data/pull_requests/608 - Vulkan_ a fresh start.md similarity index 100% rename from github-data/pull_requests/608-Vulkan_ a fresh start.md rename to github-data/pull_requests/608 - Vulkan_ a fresh start.md diff --git a/github-data/pull_requests/609-Added kimi-k2 support (ported from llama.cpp).md b/github-data/pull_requests/609 - Added kimi-k2 support _ported from llama.cpp_.md similarity index 100% rename from github-data/pull_requests/609-Added kimi-k2 support (ported from llama.cpp).md rename to github-data/pull_requests/609 - Added kimi-k2 support _ported from llama.cpp_.md diff --git a/github-data/pull_requests/61-Adding ability to have meta data per tensor row.md b/github-data/pull_requests/61 - Adding ability to have meta data per tensor row.md similarity index 100% rename from github-data/pull_requests/61-Adding ability to have meta data per tensor row.md rename to github-data/pull_requests/61 - Adding ability to have meta data per tensor row.md diff --git a/github-data/pull_requests/610-q8_k_r8_ experimental AVX512 version.md b/github-data/pull_requests/610 - q8_k_r8_ experimental AVX512 version.md similarity index 100% rename from github-data/pull_requests/610-q8_k_r8_ experimental AVX512 version.md rename to github-data/pull_requests/610 - q8_k_r8_ experimental AVX512 version.md diff --git a/github-data/pull_requests/611-Bump GGML_MAX_CONTEXTS to allow loading more shards.md b/github-data/pull_requests/611 - Bump GGML_MAX_CONTEXTS to allow loading more shards.md similarity index 100% rename from github-data/pull_requests/611-Bump GGML_MAX_CONTEXTS to allow loading more shards.md rename to github-data/pull_requests/611 - Bump GGML_MAX_CONTEXTS to allow loading more shards.md diff --git a/github-data/pull_requests/612-kimi-k2 convert script and chat template.md b/github-data/pull_requests/612 - kimi-k2 convert script and chat template.md similarity index 100% rename from github-data/pull_requests/612-kimi-k2 convert script and chat template.md rename to github-data/pull_requests/612 - kimi-k2 convert script and chat template.md diff --git a/github-data/pull_requests/616-Adding IQ1_KT - 1.75 bpw SOTA quants.md b/github-data/pull_requests/616 - Adding IQ1_KT - 1.75 bpw SOTA quants.md similarity index 100% rename from github-data/pull_requests/616-Adding IQ1_KT - 1.75 bpw SOTA quants.md rename to github-data/pull_requests/616 - Adding IQ1_KT - 1.75 bpw SOTA quants.md diff --git a/github-data/pull_requests/617-Fixup kimi-k2 convert indentation.md b/github-data/pull_requests/617 - Fixup kimi-k2 convert indentation.md similarity index 100% rename from github-data/pull_requests/617-Fixup kimi-k2 convert indentation.md rename to github-data/pull_requests/617 - Fixup kimi-k2 convert indentation.md diff --git a/github-data/pull_requests/618-Webui_ New Features for Conversations, Settings, and Chat Messages.md b/github-data/pull_requests/618 - Webui_ New Features for Conversations_ Settings_ and Chat Messages.md similarity index 100% rename from github-data/pull_requests/618-Webui_ New Features for Conversations, Settings, and Chat Messages.md rename to github-data/pull_requests/618 - Webui_ New Features for Conversations_ Settings_ and Chat Messages.md diff --git a/github-data/pull_requests/62-Use fp32 for K_Q in Metal FA implementation.md b/github-data/pull_requests/62 - Use fp32 for K_Q in Metal FA implementation.md similarity index 100% rename from github-data/pull_requests/62-Use fp32 for K_Q in Metal FA implementation.md rename to github-data/pull_requests/62 - Use fp32 for K_Q in Metal FA implementation.md diff --git a/github-data/pull_requests/620-Bump Windows max open files from 512 to 2048.md b/github-data/pull_requests/620 - Bump Windows max open files from 512 to 2048.md similarity index 100% rename from github-data/pull_requests/620-Bump Windows max open files from 512 to 2048.md rename to github-data/pull_requests/620 - Bump Windows max open files from 512 to 2048.md diff --git a/github-data/pull_requests/622-Add GGML_MAX_CONTEXTS definition in CMakeLists.txt.md b/github-data/pull_requests/622 - Add GGML_MAX_CONTEXTS definition in CMakeLists.txt.md similarity index 100% rename from github-data/pull_requests/622-Add GGML_MAX_CONTEXTS definition in CMakeLists.txt.md rename to github-data/pull_requests/622 - Add GGML_MAX_CONTEXTS definition in CMakeLists.txt.md diff --git a/github-data/pull_requests/624-Quantization tweaks.md b/github-data/pull_requests/624 - Quantization tweaks.md similarity index 100% rename from github-data/pull_requests/624-Quantization tweaks.md rename to github-data/pull_requests/624 - Quantization tweaks.md diff --git a/github-data/pull_requests/628-[Draft] Function calling support for Kimi-K2.md b/github-data/pull_requests/628 - _Draft_ Function calling support for Kimi-K2.md similarity index 100% rename from github-data/pull_requests/628-[Draft] Function calling support for Kimi-K2.md rename to github-data/pull_requests/628 - _Draft_ Function calling support for Kimi-K2.md diff --git a/github-data/pull_requests/630-GEMM for IQ1_M.md b/github-data/pull_requests/630 - GEMM for IQ1_M.md similarity index 100% rename from github-data/pull_requests/630-GEMM for IQ1_M.md rename to github-data/pull_requests/630 - GEMM for IQ1_M.md diff --git a/github-data/pull_requests/64-Better sub-3-bit quantization mixes with a qkv tensor.md b/github-data/pull_requests/64 - Better sub-3-bit quantization mixes with a qkv tensor.md similarity index 100% rename from github-data/pull_requests/64-Better sub-3-bit quantization mixes with a qkv tensor.md rename to github-data/pull_requests/64 - Better sub-3-bit quantization mixes with a qkv tensor.md diff --git a/github-data/pull_requests/65-Adding SWIGLU unary op.md b/github-data/pull_requests/65 - Adding SWIGLU unary op.md similarity index 100% rename from github-data/pull_requests/65-Adding SWIGLU unary op.md rename to github-data/pull_requests/65 - Adding SWIGLU unary op.md diff --git a/github-data/pull_requests/66-CUDA non-contiguous RoPE.md b/github-data/pull_requests/66 - CUDA non-contiguous RoPE.md similarity index 100% rename from github-data/pull_requests/66-CUDA non-contiguous RoPE.md rename to github-data/pull_requests/66 - CUDA non-contiguous RoPE.md diff --git a/github-data/pull_requests/68-It is time to fix replace_all.md b/github-data/pull_requests/68 - It is time to fix replace_all.md similarity index 100% rename from github-data/pull_requests/68-It is time to fix replace_all.md rename to github-data/pull_requests/68 - It is time to fix replace_all.md diff --git a/github-data/pull_requests/69-Allow bf16 kv-cache.md b/github-data/pull_requests/69 - Allow bf16 kv-cache.md similarity index 100% rename from github-data/pull_requests/69-Allow bf16 kv-cache.md rename to github-data/pull_requests/69 - Allow bf16 kv-cache.md diff --git a/github-data/pull_requests/7-Adding IQ2_K, IQ3_K and IQ5_K.md b/github-data/pull_requests/7 - Adding IQ2_K_ IQ3_K and IQ5_K.md similarity index 100% rename from github-data/pull_requests/7-Adding IQ2_K, IQ3_K and IQ5_K.md rename to github-data/pull_requests/7 - Adding IQ2_K_ IQ3_K and IQ5_K.md diff --git a/github-data/pull_requests/70-Fused unary(x)_y.md b/github-data/pull_requests/70 - Fused unary_x_y.md similarity index 100% rename from github-data/pull_requests/70-Fused unary(x)_y.md rename to github-data/pull_requests/70 - Fused unary_x_y.md diff --git a/github-data/pull_requests/71-iqk_mul_mat_ better srategy when nrc_y not divisible by ny.md b/github-data/pull_requests/71 - iqk_mul_mat_ better srategy when nrc_y not divisible by ny.md similarity index 100% rename from github-data/pull_requests/71-iqk_mul_mat_ better srategy when nrc_y not divisible by ny.md rename to github-data/pull_requests/71 - iqk_mul_mat_ better srategy when nrc_y not divisible by ny.md diff --git a/github-data/pull_requests/72-iqk_mul_mat_ better iq4_nl implementation on Zen4_AVX2.md b/github-data/pull_requests/72 - iqk_mul_mat_ better iq4_nl implementation on Zen4_AVX2.md similarity index 100% rename from github-data/pull_requests/72-iqk_mul_mat_ better iq4_nl implementation on Zen4_AVX2.md rename to github-data/pull_requests/72 - iqk_mul_mat_ better iq4_nl implementation on Zen4_AVX2.md diff --git a/github-data/pull_requests/73-CUDA_ faster float -_ iq4_nl conversion.md b/github-data/pull_requests/73 - CUDA_ faster float -_ iq4_nl conversion.md similarity index 100% rename from github-data/pull_requests/73-CUDA_ faster float -_ iq4_nl conversion.md rename to github-data/pull_requests/73 - CUDA_ faster float -_ iq4_nl conversion.md diff --git a/github-data/pull_requests/74-IQ4_NL kv-cache on the CPU (Zen4_AVX2_ARM_NEON).md b/github-data/pull_requests/74 - IQ4_NL kv-cache on the CPU _Zen4_AVX2_ARM_NEON_.md similarity index 100% rename from github-data/pull_requests/74-IQ4_NL kv-cache on the CPU (Zen4_AVX2_ARM_NEON).md rename to github-data/pull_requests/74 - IQ4_NL kv-cache on the CPU _Zen4_AVX2_ARM_NEON_.md diff --git a/github-data/pull_requests/75-Fix Q5_0 flash attention.md b/github-data/pull_requests/75 - Fix Q5_0 flash attention.md similarity index 100% rename from github-data/pull_requests/75-Fix Q5_0 flash attention.md rename to github-data/pull_requests/75 - Fix Q5_0 flash attention.md diff --git a/github-data/pull_requests/76-iq4_nl_ faster quantization.md b/github-data/pull_requests/76 - iq4_nl_ faster quantization.md similarity index 100% rename from github-data/pull_requests/76-iq4_nl_ faster quantization.md rename to github-data/pull_requests/76 - iq4_nl_ faster quantization.md diff --git a/github-data/pull_requests/77-Adding Q6_0.md b/github-data/pull_requests/77 - Adding Q6_0.md similarity index 100% rename from github-data/pull_requests/77-Adding Q6_0.md rename to github-data/pull_requests/77 - Adding Q6_0.md diff --git a/github-data/pull_requests/78-q6_0_ Slightly faster Zen4_AVX2.md b/github-data/pull_requests/78 - q6_0_ Slightly faster Zen4_AVX2.md similarity index 100% rename from github-data/pull_requests/78-q6_0_ Slightly faster Zen4_AVX2.md rename to github-data/pull_requests/78 - q6_0_ Slightly faster Zen4_AVX2.md diff --git a/github-data/pull_requests/79-Do not quantize activations if not necessary.md b/github-data/pull_requests/79 - Do not quantize activations if not necessary.md similarity index 100% rename from github-data/pull_requests/79-Do not quantize activations if not necessary.md rename to github-data/pull_requests/79 - Do not quantize activations if not necessary.md diff --git a/github-data/pull_requests/80-Move to c++17 projectwide.md b/github-data/pull_requests/80 - Move to c_17 projectwide.md similarity index 100% rename from github-data/pull_requests/80-Move to c++17 projectwide.md rename to github-data/pull_requests/80 - Move to c_17 projectwide.md diff --git a/github-data/pull_requests/81-Cleanup scale fudge factors.md b/github-data/pull_requests/81 - Cleanup scale fudge factors.md similarity index 100% rename from github-data/pull_requests/81-Cleanup scale fudge factors.md rename to github-data/pull_requests/81 - Cleanup scale fudge factors.md diff --git a/github-data/pull_requests/83-New SOTA quantization_ 4.25 bpw IQ4_KS.md b/github-data/pull_requests/83 - New SOTA quantization_ 4.25 bpw IQ4_KS.md similarity index 100% rename from github-data/pull_requests/83-New SOTA quantization_ 4.25 bpw IQ4_KS.md rename to github-data/pull_requests/83 - New SOTA quantization_ 4.25 bpw IQ4_KS.md diff --git a/github-data/pull_requests/84-Better model info.md b/github-data/pull_requests/84 - Better model info.md similarity index 100% rename from github-data/pull_requests/84-Better model info.md rename to github-data/pull_requests/84 - Better model info.md diff --git a/github-data/pull_requests/85-IQ2_KS_ 2.1875 bpw non-linear quantization.md b/github-data/pull_requests/85 - IQ2_KS_ 2.1875 bpw non-linear quantization.md similarity index 100% rename from github-data/pull_requests/85-IQ2_KS_ 2.1875 bpw non-linear quantization.md rename to github-data/pull_requests/85 - IQ2_KS_ 2.1875 bpw non-linear quantization.md diff --git a/github-data/pull_requests/86-Fix and optimize iq2k Metal implementation.md b/github-data/pull_requests/86 - Fix and optimize iq2k Metal implementation.md similarity index 100% rename from github-data/pull_requests/86-Fix and optimize iq2k Metal implementation.md rename to github-data/pull_requests/86 - Fix and optimize iq2k Metal implementation.md diff --git a/github-data/pull_requests/87-iq3_k_ fix and optimize Metal dot product.md b/github-data/pull_requests/87 - iq3_k_ fix and optimize Metal dot product.md similarity index 100% rename from github-data/pull_requests/87-iq3_k_ fix and optimize Metal dot product.md rename to github-data/pull_requests/87 - iq3_k_ fix and optimize Metal dot product.md diff --git a/github-data/pull_requests/89-Adding IQ4_KSS_ 4.0 bpw quants.md b/github-data/pull_requests/89 - Adding IQ4_KSS_ 4.0 bpw quants.md similarity index 100% rename from github-data/pull_requests/89-Adding IQ4_KSS_ 4.0 bpw quants.md rename to github-data/pull_requests/89 - Adding IQ4_KSS_ 4.0 bpw quants.md diff --git a/github-data/pull_requests/9-Fused soft cap and SIMD-ified GeLU .md b/github-data/pull_requests/9 - Fused soft cap and SIMD-ified GeLU.md similarity index 100% rename from github-data/pull_requests/9-Fused soft cap and SIMD-ified GeLU .md rename to github-data/pull_requests/9 - Fused soft cap and SIMD-ified GeLU.md diff --git a/github-data/pull_requests/90-iq4_ks_ faster dot product on Metal.md b/github-data/pull_requests/90 - iq4_ks_ faster dot product on Metal.md similarity index 100% rename from github-data/pull_requests/90-iq4_ks_ faster dot product on Metal.md rename to github-data/pull_requests/90 - iq4_ks_ faster dot product on Metal.md diff --git a/github-data/pull_requests/91-CLI - Specify GGML_TYPE to quantize for the main tensors..md b/github-data/pull_requests/91 - CLI - Specify GGML_TYPE to quantize for the main tensors..md similarity index 100% rename from github-data/pull_requests/91-CLI - Specify GGML_TYPE to quantize for the main tensors..md rename to github-data/pull_requests/91 - CLI - Specify GGML_TYPE to quantize for the main tensors..md diff --git a/github-data/pull_requests/93-Attempt to blindly fix Windows build failure.md b/github-data/pull_requests/93 - Attempt to blindly fix Windows build failure.md similarity index 100% rename from github-data/pull_requests/93-Attempt to blindly fix Windows build failure.md rename to github-data/pull_requests/93 - Attempt to blindly fix Windows build failure.md diff --git a/github-data/pull_requests/94-Adding @agray3's graph caching approach.md b/github-data/pull_requests/94 - Adding _agray3_s graph caching approach.md similarity index 100% rename from github-data/pull_requests/94-Adding @agray3's graph caching approach.md rename to github-data/pull_requests/94 - Adding _agray3_s graph caching approach.md diff --git a/github-data/pull_requests/96-Quant strategies_ attn_q Q4 & attn_v Q6 for Llama 3.1 Q5_K_S.md b/github-data/pull_requests/96 - Quant strategies_ attn_q Q4 _ attn_v Q6 for Llama 3.1 Q5_K_S.md similarity index 100% rename from github-data/pull_requests/96-Quant strategies_ attn_q Q4 & attn_v Q6 for Llama 3.1 Q5_K_S.md rename to github-data/pull_requests/96 - Quant strategies_ attn_q Q4 _ attn_v Q6 for Llama 3.1 Q5_K_S.md diff --git a/github-data/pull_requests/97-Bitnet_ make the scale tensors optional.md b/github-data/pull_requests/97 - Bitnet_ make the scale tensors optional.md similarity index 100% rename from github-data/pull_requests/97-Bitnet_ make the scale tensors optional.md rename to github-data/pull_requests/97 - Bitnet_ make the scale tensors optional.md diff --git a/github-data/pull_requests/98-Avoid rebuild of GGML graph for each token.md b/github-data/pull_requests/98 - Avoid rebuild of GGML graph for each token.md similarity index 100% rename from github-data/pull_requests/98-Avoid rebuild of GGML graph for each token.md rename to github-data/pull_requests/98 - Avoid rebuild of GGML graph for each token.md diff --git a/github-data/pull_requests/99-Enable IQ4_NL for KV-cache in token generation using Flash Attention .md b/github-data/pull_requests/99 - Enable IQ4_NL for KV-cache in token generation using Flash Attention.md similarity index 100% rename from github-data/pull_requests/99-Enable IQ4_NL for KV-cache in token generation using Flash Attention .md rename to github-data/pull_requests/99 - Enable IQ4_NL for KV-cache in token generation using Flash Attention.md