mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-05-02 04:11:41 +00:00
* Enable IQ4_NL for V-cache in token generation * We don't need these * Update printour of allowed quantized KV-cache combinations * Add IQ4_NL + IQ4_NL to FA This is a better alternative than Q4_0 + Q4_0 for the VRAM poor. * Remove file added by mistake * Fix typo, which is not really a bug --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
6 lines
181 B
Plaintext
6 lines
181 B
Plaintext
// This file has been autogenerated by generate_cu_files.py, do not edit manually.
|
|
|
|
#include "../fattn-vec-f16.cuh"
|
|
|
|
DECL_FATTN_VEC_F16_CASE(128, GGML_TYPE_Q4_1, GGML_TYPE_IQ4_NL);
|