Files
ik_llama.cpp/ggml/src/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-q4_1-iq4_nl.cu
Kawrakow dbf951df15 Enable IQ4_NL for KV-cache in token generation using Flash Attention (#99)
* Enable IQ4_NL for V-cache in token generation

* We don't need these

* Update printour of allowed quantized KV-cache combinations

* Add IQ4_NL + IQ4_NL to FA

This is a better alternative than Q4_0 + Q4_0 for the VRAM poor.

* Remove file added by mistake

* Fix typo, which is not really a bug

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-10-21 12:16:54 +02:00

6 lines
181 B
Plaintext

// This file has been autogenerated by generate_cu_files.py, do not edit manually.
#include "../fattn-vec-f16.cuh"
DECL_FATTN_VEC_F16_CASE(128, GGML_TYPE_Q4_1, GGML_TYPE_IQ4_NL);