From 89587d13df4e0339d4f4e1c81ba6fce77689a784 Mon Sep 17 00:00:00 2001
From: turboderp <turboderp@users.noreply.github.com>
Date: Sat, 16 Dec 2023 22:03:25 +0100
Subject: [PATCH] Update convert.py instructions

---
 doc/convert.md | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/doc/convert.md b/doc/convert.md
index 6f1d5b2..1a74f29 100644
--- a/doc/convert.md
+++ b/doc/convert.md
@@ -98,7 +98,6 @@ Convert a model and create a directory containing the quantized version with all
 python convert.py \
     -i /mnt/models/llama2-7b-fp16/ \
     -o /mnt/temp/exl2/ \
-    -c /mnt/datasets/parquet/wikitext-test.parquet \
     -cf /mnt/models/llama2-7b-exl2/3.0bpw/ \
     -b 3.0 
 ```
@@ -110,7 +109,6 @@ python convert.py \
     -i /mnt/models/llama2-7b-fp16/ \
     -o /mnt/temp/exl2/ \
     -nr \
-    -c /mnt/datasets/parquet/wikitext-test.parquet \
     -om /mnt/models/llama2-7b-exl2/measurement.json
 ```
 
@@ -121,7 +119,6 @@ python convert.py \
     -i /mnt/models/llama2-7b-fp16/ \
     -o /mnt/temp/exl2/ \
     -nr \
-    -c /mnt/datasets/parquet/wikitext-test.parquet \
     -m /mnt/models/llama2-7b-exl2/measurement.json \
     -cf /mnt/models/llama2-7b-exl2/4.0bpw/ \
     -b 4.0
@@ -130,12 +127,26 @@ python convert.py \
     -i /mnt/models/llama2-7b-fp16/ \
     -o /mnt/temp/exl2/ \
     -nr \
-    -c /mnt/datasets/parquet/wikitext-test.parquet \
     -m /mnt/models/llama2-7b-exl2/measurement.json \
     -cf /mnt/models/llama2-7b-exl2/4.5bpw/ \
     -b 4.5
 ```
 
+### Notes
+
+- If the conversion script seems to stop on the "Solving..." step, give it a moment. It's attempting to find the 
+combination of quantization parameters within the bits budget that minimizes the product of measured errors per
+individual layer, and the implementation is not very efficient.
+- During measurement and conversion of MoE models you may see a message like: 
+`!! Warning: w2.7 has less than 10% calibration for 77/115 rows`. This happens when a particular expert isn't triggered
+enough during the reference forward passes to get a good amount of calibration data. It won't cause the
+conversion to fail, and it may not be a big deal at all, but GPTQ-style quantization of MoE models is very new so I'm
+not yet sure if it actually matters.
+- After conversion, the "calibration perplexity (quant)" is a perplexity calculation on a small sample of the 
+calibration data as processed by the quantized model under construction. If it looks too high (30 or more), 
+quantization likely didn't go well, and if it's unreasonably high (in the thousands, for instance) quantization failed
+catastrophically. 
+
 ### Hardware requirements
 
 Roughly speaking, you'll need about 64 GB or RAM and 24 GB of VRAM to convert a 70B model, while 7B seems to require