mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-02-10 08:20:09 +00:00
52 lines
2.0 KiB
Markdown
52 lines
2.0 KiB
Markdown
### 📝 [#498](https://github.com/ikawrakow/ik_llama.cpp/issues/498) - question: about quantize method
|
|
|
|
| **Author** | `nigelzzz` |
|
|
| :--- | :--- |
|
|
| **State** | ❌ **Closed** |
|
|
| **Created** | 2025-06-06 |
|
|
| **Updated** | 2025-06-14 |
|
|
|
|
---
|
|
|
|
#### Description
|
|
|
|
Hi,
|
|
the project is amazing and interesting, looks like it better thank origin llama.cpp.
|
|
|
|
I would like to study the repo and because there are a lot of quantize method same with origin llama.cpp, can i know how to choose quantize method to study.
|
|
|
|
my env is rpi5, and i often test bitnet and llama3.2 1b or 3b.
|
|
|
|
thanks
|
|
|
|
---
|
|
|
|
#### 💬 Conversation
|
|
|
|
👤 **ikawrakow** commented the **2025-06-06** at **16:39:00**:<br>
|
|
|
|
For BitNet take a look at `IQ1_BN` and `IQ2_BN`. The packing in `IQ2_BN` is simpler and easier to understand, but uses 2 bits per weight. `IQ1_BN` uses 1.625 bits per weight, which is very close to the theoretical 1.58 bits for a ternary data type.
|
|
|
|
Otherwise not sure what to recommend. Any of the quantization types should be OK for LlaMA-3.1-1B/3B on Rpi5. If you are new to the subject, it might be better to look into the simpler quantization types (e.g., `QX_K`) first.
|
|
|
|
---
|
|
|
|
👤 **aezendc** commented the **2025-06-09** at **10:48:49**:<br>
|
|
|
|
> For BitNet take a look at `IQ1_BN` and `IQ2_BN`. The packing in `IQ2_BN` is simpler and easier to understand, but uses 2 bits per weight. `IQ1_BN` uses 1.625 bits per weight, which is very close to the theoretical 1.58 bits for a ternary data type.
|
|
>
|
|
> Otherwise not sure what to recommend. Any of the quantization types should be OK for LlaMA-3.1-1B/3B on Rpi5. If you are new to the subject, it might be better to look into the simpler quantization types (e.g., `QX_K`) first.
|
|
|
|
I like the iq1_bn quantize. Its good and I am using it. Is there a way we can make this support function calling?
|
|
|
|
---
|
|
|
|
👤 **ikawrakow** commented the **2025-06-09** at **11:01:33**:<br>
|
|
|
|
See #407
|
|
|
|
---
|
|
|
|
👤 **ikawrakow** commented the **2025-06-14** at **12:01:58**:<br>
|
|
|
|
I think we can close it. |