ik_llama.cpp/llama.cpp at 8c0a10e64dbf60fd9946c0cd5e6f59690800b123

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-11 14:30:02 +00:00

Files

Kerfuffle 4f0154b0ba llama : support requantizing models instead of only allowing quantization from 16/32bit (#1691 )

* Add support for quantizing already quantized models

* Threaded dequantizing and f16 to f32 conversion

* Clean up thread blocks with spares calculation a bit

* Use std::runtime_error exceptions.

2023-06-10 10:59:17 +03:00

116 KiB

Raw Blame History

View Raw

116 KiB Raw Blame History

116 KiB

Raw Blame History