Files
ik_llama.cpp/iqk-quantize.h
Kawrakow 1f9541172f Bitnet(1.75 bpw): higher precision fp8 scale
Use 3 bits for the exponent and 5 bits for the mantissa.
This makes PPL to be the same as fp16 (but the previous
version with 4 bits for the exponent and mantissa was
good enough for any practical purposes).
2024-06-22 12:02:52 +03:00

1.0 KiB