Files
ik_llama.cpp/llama.cpp
Kawrakow ab18bc5e87 k_quants tuning for Falcon-7b (#2816)
* Make ggml-cuda.cu build with QK_K = 64

Using LLAMA_CUDA_FORCE_DMMV = ON and -nommq it runs and produces
a meaningful result.

* k_quants tuning for Falcon-7b

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-08-27 15:19:59 +03:00

221 KiB