mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-02-19 04:40:09 +00:00
Add GitHub data: filename sanitization (#640)
This commit is contained in:
52
github-data/issues/498 - question_ about quantize method.md
Normal file
52
github-data/issues/498 - question_ about quantize method.md
Normal file
@@ -0,0 +1,52 @@
|
||||
### 📝 [#498](https://github.com/ikawrakow/ik_llama.cpp/issues/498) - question: about quantize method
|
||||
|
||||
| **Author** | `nigelzzz` |
|
||||
| :--- | :--- |
|
||||
| **State** | ❌ **Closed** |
|
||||
| **Created** | 2025-06-06 |
|
||||
| **Updated** | 2025-06-14 |
|
||||
|
||||
---
|
||||
|
||||
#### Description
|
||||
|
||||
Hi,
|
||||
the project is amazing and interesting, looks like it better thank origin llama.cpp.
|
||||
|
||||
I would like to study the repo and because there are a lot of quantize method same with origin llama.cpp, can i know how to choose quantize method to study.
|
||||
|
||||
my env is rpi5, and i often test bitnet and llama3.2 1b or 3b.
|
||||
|
||||
thanks
|
||||
|
||||
---
|
||||
|
||||
#### 💬 Conversation
|
||||
|
||||
👤 **ikawrakow** commented the **2025-06-06** at **16:39:00**:<br>
|
||||
|
||||
For BitNet take a look at `IQ1_BN` and `IQ2_BN`. The packing in `IQ2_BN` is simpler and easier to understand, but uses 2 bits per weight. `IQ1_BN` uses 1.625 bits per weight, which is very close to the theoretical 1.58 bits for a ternary data type.
|
||||
|
||||
Otherwise not sure what to recommend. Any of the quantization types should be OK for LlaMA-3.1-1B/3B on Rpi5. If you are new to the subject, it might be better to look into the simpler quantization types (e.g., `QX_K`) first.
|
||||
|
||||
---
|
||||
|
||||
👤 **aezendc** commented the **2025-06-09** at **10:48:49**:<br>
|
||||
|
||||
> For BitNet take a look at `IQ1_BN` and `IQ2_BN`. The packing in `IQ2_BN` is simpler and easier to understand, but uses 2 bits per weight. `IQ1_BN` uses 1.625 bits per weight, which is very close to the theoretical 1.58 bits for a ternary data type.
|
||||
>
|
||||
> Otherwise not sure what to recommend. Any of the quantization types should be OK for LlaMA-3.1-1B/3B on Rpi5. If you are new to the subject, it might be better to look into the simpler quantization types (e.g., `QX_K`) first.
|
||||
|
||||
I like the iq1_bn quantize. Its good and I am using it. Is there a way we can make this support function calling?
|
||||
|
||||
---
|
||||
|
||||
👤 **ikawrakow** commented the **2025-06-09** at **11:01:33**:<br>
|
||||
|
||||
See #407
|
||||
|
||||
---
|
||||
|
||||
👤 **ikawrakow** commented the **2025-06-14** at **12:01:58**:<br>
|
||||
|
||||
I think we can close it.
|
||||
Reference in New Issue
Block a user