ik_llama.cpp/src at c7ecd4e23acb42f1150abf0b118e0a2c7b8dc959 - ik_llama.cpp - Public git mirror

ikawrakow/ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-24 15:14:10 +00:00

Files

History

Nexes the Elder c7ecd4e23a Legacy quants conversion schemes in convert_hf_to_gguf.py (#449 )

* Legacy quants conversion schemes in convert_hf_to_gguf.py

This, notably in order to make smaller conversions to generate an iMatrix file.

`Q4_0`,`Q4_1` are here using embeddings, output, attn_k and attn_v in q5_0.
`Q5_0`,`Q5_1` are here using embeddings, output, attn_k and attn_v in q8_0.

Adapted from the following llama.cpp mainline PR : https://github.com/ggml-org/llama.cpp/pull/9022
Original author @chentyjpm

Also, 2 forgotten mentions of FTYPE IQ3_KL in llama.cpp file.

* forgotten IQ5_KS case mention

2025-05-24 11:49:10 +03:00

..

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

Legacy quants conversion schemes in convert_hf_to_gguf.py (#449 )

2025-05-24 11:49:10 +03:00

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

Faster IQ3_KT and IQ4_KT (#453 )

2025-05-24 11:48:52 +03:00

kompute @ 4565194ed7

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

kompute-shaders

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

CMakeLists.txt

Trellis quants with CPU inference (#441 )

2025-05-23 09:17:52 +03:00

ggml-aarch64.c

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-aarch64.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-alloc.c

Fix ARM_NEON build failure due to q8_2 (#303 )

2025-04-01 13:48:20 +02:00

ggml-backend-impl.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-backend.c

Bug fixes from mainline (#439 )

2025-05-20 17:03:14 +03:00

ggml-blas.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-cann.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-common.h

Trellis quants with CPU inference (#441 )

2025-05-23 09:17:52 +03:00

ggml-cuda.cu

Trellis quants with CPU inference (#441 )

2025-05-23 09:17:52 +03:00

ggml-impl.h

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-kompute.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-metal.m

Adding IQ5_KS - 5.25 bpw quants (#422 )

2025-05-15 16:02:39 +03:00

ggml-metal.metal

Adding IQ5_KS - 5.25 bpw quants (#422 )

2025-05-15 16:02:39 +03:00

ggml-quants.c

Trellis quants with CPU inference (#441 )

2025-05-23 09:17:52 +03:00

ggml-quants.h

IQ1_M_R4: better 1.75 bpw quants (#187 )

2025-02-06 14:08:52 +02:00

ggml-rpc.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-sycl.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-vulkan.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml.c

Trellis quants with CPU inference (#441 )

2025-05-23 09:17:52 +03:00