mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-01-31 03:29:52 +00:00
* WIP * WIP: still getting illegal memory access * CUDA: MMQ for iq4_ks now works ~25% faster than dequantize+cuBLAS, ~10% slower than Q4_0 MMQ. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>