2.0 KiB
📝 #498 - question: about quantize method
| Author | nigelzzz |
|---|---|
| State | ❌ Closed |
| Created | 2025-06-06 |
| Updated | 2025-06-14 |
Description
Hi, the project is amazing and interesting, looks like it better thank origin llama.cpp.
I would like to study the repo and because there are a lot of quantize method same with origin llama.cpp, can i know how to choose quantize method to study.
my env is rpi5, and i often test bitnet and llama3.2 1b or 3b.
thanks
💬 Conversation
👤 ikawrakow commented the 2025-06-06 at 16:39:00:
For BitNet take a look at IQ1_BN and IQ2_BN. The packing in IQ2_BN is simpler and easier to understand, but uses 2 bits per weight. IQ1_BN uses 1.625 bits per weight, which is very close to the theoretical 1.58 bits for a ternary data type.
Otherwise not sure what to recommend. Any of the quantization types should be OK for LlaMA-3.1-1B/3B on Rpi5. If you are new to the subject, it might be better to look into the simpler quantization types (e.g., QX_K) first.
👤 aezendc commented the 2025-06-09 at 10:48:49:
For BitNet take a look at
IQ1_BNandIQ2_BN. The packing inIQ2_BNis simpler and easier to understand, but uses 2 bits per weight.IQ1_BNuses 1.625 bits per weight, which is very close to the theoretical 1.58 bits for a ternary data type.Otherwise not sure what to recommend. Any of the quantization types should be OK for LlaMA-3.1-1B/3B on Rpi5. If you are new to the subject, it might be better to look into the simpler quantization types (e.g.,
QX_K) first.
I like the iq1_bn quantize. Its good and I am using it. Is there a way we can make this support function calling?
👤 ikawrakow commented the 2025-06-09 at 11:01:33:
See #407
👤 ikawrakow commented the 2025-06-14 at 12:01:58:
I think we can close it.