mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-01-26 17:20:01 +00:00
1.1 KiB
1.1 KiB
🔀 #555 - Add Falcon-Edge support
| Author | ikawrakow |
|---|---|
| State | ❌ Closed |
| Created | 2025-06-25 |
| Updated | 2025-06-26 |
Description
Closes #551
How to use:
- Grab a GGUF containing Microsoft's
i2_squant packing. E.g.,
huggingface-cli download --local-dir falcon tiiuae/Falcon-E-3B-Instruct-GGUF
- Convert to
ik_llama.cppquantsiq2_bnoriq1_bn.iq2_bnuses 2 bits per weight (bpw),iq1_bnuses 1.625 bpw.iq2_bnis faster for prompt processing, and may also be faster for token generation (TG) on devices with limited computing power.iq1_bnuses 20% less RAM, so that if TG is memory bound, it will be slightly faster thaniq2_bn. Command to convert is
./bin/llama-quantize --allow-requantize falcon/ggml-model-i2_s.gguf falcon_iq2_bn.gguf iq2_bn
(replace iq2_bn with iq1_bn if you prefer the smaller variant.
- Utilize the just created model file in the usual way with
llama-cli, llama-server, etc.