mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-03-09 21:40:22 +00:00
* Make ggml-cuda.cu build with QK_K = 64 Using LLAMA_CUDA_FORCE_DMMV = ON and -nommq it runs and produces a meaningful result. * k_quants tuning for Falcon-7b --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
221 KiB
221 KiB