ik_llama.cpp/ggml/src/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-q4_1-iq4_nl.cu at dbf951df1594a3dec36eca9ab81a0f7ba81b11cd - ik_llama.cpp - Public git mirror

ikawrakow/ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-05-02 04:11:41 +00:00

Files

Kawrakow dbf951df15 Enable IQ4_NL for KV-cache in token generation using Flash Attention (#99 )

* Enable IQ4_NL for V-cache in token generation

* We don't need these

* Update printour of allowed quantized KV-cache combinations

* Add IQ4_NL + IQ4_NL to FA

This is a better alternative than Q4_0 + Q4_0 for the VRAM poor.

* Remove file added by mistake

* Fix typo, which is not really a bug

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2024-10-21 12:16:54 +02:00

6 lines

181 B

Plaintext

Raw Blame History

 // This file has been autogenerated by generate_cu_files.py, do not edit manually.
 #include "../fattn-vec-f16.cuh"
 DECL_FATTN_VEC_F16_CASE(128, GGML_TYPE_Q4_1, GGML_TYPE_IQ4_NL);