iq2_tn: TriLM specific 2.0625 bpw quantization

Quantize/dequantize/scale dot product.

I get 46 t/s for the TriLM-3.9B with any SIMD!
Finally a compiler doing a decent job auto-vectorizing the
scalar implementation.
This commit is contained in:
Iwan Kawrakow
2024-08-05 14:22:05 +03:00
parent b409c15363
commit 1b41d792ec
9 changed files with 157 additions and 3 deletions

View File

@@ -174,6 +174,7 @@ extern "C" {
LLAMA_FTYPE_MOSTLY_IQ3_K = 39, // except 1d tensors
LLAMA_FTYPE_MOSTLY_IQ4_K = 40, // except 1d tensors
LLAMA_FTYPE_MOSTLY_IQ5_K = 41, // except 1d tensors
LLAMA_FTYPE_MOSTLY_IQ2_TN = 42, // except 1d tensors
LLAMA_FTYPE_GUESSED = 1024, // not specified in the model file
};