ik_llama.cpp/src at 7c5a91daf1f8e9da889b12e722c1601ca3affb8c - ik_llama.cpp - Public git mirror

ikawrakow/ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-26 08:04:09 +00:00

Files

History

Kawrakow 7c5a91daf1 Enable IQ4_NL for KV-cache in token generation using Flash Attention (#99 )

* Enable IQ4_NL for V-cache in token generation

* We don't need these

* Update printour of allowed quantized KV-cache combinations

* Add IQ4_NL + IQ4_NL to FA

This is a better alternative than Q4_0 + Q4_0 for the VRAM poor.

* Remove file added by mistake

* Fix typo, which is not really a bug

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2024-10-21 12:16:54 +02:00

..

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

Enable IQ4_NL for KV-cache in token generation using Flash Attention (#99 )

2024-10-21 12:16:54 +02:00

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

Attempt to blindly fix Windows build failure (#93 )

2024-10-19 11:43:04 +02:00

kompute @ 4565194ed7

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

kompute-shaders

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

CMakeLists.txt

Enable IQ4_NL for KV-cache in token generation using Flash Attention (#99 )

2024-10-21 12:16:54 +02:00

ggml-aarch64.c

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-aarch64.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-alloc.c

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-backend-impl.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-backend.c

Avoid rebuild of GGML graph for each token (#98 )

2024-10-20 08:36:16 +02:00

ggml-blas.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-cann.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-common.h

Adding IQ4_KSS: 4.0 bpw quants (#89 )

2024-10-16 15:18:26 +03:00

ggml-cuda.cu

Adding IQ4_KSS: 4.0 bpw quants (#89 )

2024-10-16 15:18:26 +03:00

ggml-impl.h

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-kompute.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-metal.m

Adding IQ4_KSS: 4.0 bpw quants (#89 )

2024-10-16 15:18:26 +03:00

ggml-metal.metal

Adding IQ4_KSS: 4.0 bpw quants (#89 )

2024-10-16 15:18:26 +03:00

ggml-quants.c

Adding IQ4_KSS: 4.0 bpw quants (#89 )

2024-10-16 15:18:26 +03:00

ggml-quants.h

IQ2_KS: 2.1875 bpw non-linear quantization (#85 )

2024-10-13 13:34:30 +03:00

ggml-rpc.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-sycl.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-vulkan.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml.c

Adding IQ4_KSS: 4.0 bpw quants (#89 )

2024-10-16 15:18:26 +03:00