mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-05-01 03:41:53 +00:00
Legacy quants conversion schemes in convert_hf_to_gguf.py (#449)
* Legacy quants conversion schemes in convert_hf_to_gguf.py This, notably in order to make smaller conversions to generate an iMatrix file. `Q4_0`,`Q4_1` are here using embeddings, output, attn_k and attn_v in q5_0. `Q5_0`,`Q5_1` are here using embeddings, output, attn_k and attn_v in q8_0. Adapted from the following llama.cpp mainline PR : https://github.com/ggml-org/llama.cpp/pull/9022 Original author @chentyjpm Also, 2 forgotten mentions of FTYPE IQ3_KL in llama.cpp file. * forgotten IQ5_KS case mention
This commit is contained in:
@@ -652,6 +652,7 @@ bool ggml_cuda_mmvq_type_supported(ggml_type src0_type) {
|
||||
case GGML_TYPE_IQ4_KSS:
|
||||
case GGML_TYPE_IQ2_KS:
|
||||
case GGML_TYPE_IQ5_K:
|
||||
case GGML_TYPE_IQ5_KS:
|
||||
case GGML_TYPE_IQ6_K:
|
||||
case GGML_TYPE_IQ3_S:
|
||||
return true;
|
||||
|
||||
Reference in New Issue
Block a user