Update README with warning about '_XL' models from Unsloth

Added important note regarding quantized models from Unsloth.
2026-05-11 00:20:19 +00:00 · 2026-02-22 07:42:17 +01:00
parent bd387a279a
commit cbf7fc7e2f
1 changed files with 3 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -6,6 +6,9 @@

 This repository is a fork of [llama.cpp](https://github.com/ggerganov/llama.cpp) with better CPU and hybrid GPU/CPU performance, new SOTA quantization types, first-class Bitnet support, better DeepSeek performance via MLA, FlashMLA, fused MoE operations and tensor overrides for hybrid GPU/CPU inference, row-interleaved quant packing, etc.

+>[!IMPORTANT]
+>Do not use quantized models from Unsloth that have `_XL` in their name. These are likely to not work with `ik_llama.cpp`.
+  
 ## Quickstart

 ### Prerequisites