ik_llama.cpp/ggml-cuda.cu at 4e58bb2f8db7e7aa409486aa99b78b0c42d771ca

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-28 02:11:50 +00:00

Files

slaren 30671dbce8 ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412 )

* ggml-cuda : perform cublas matrix multiplication of quantized types as fp16

* rename CC_TURING to CC_VOLTA

* disable fp16 mat mul completely with multi GPU

2023-09-30 18:12:57 +02:00

274 KiB

Raw Blame History

View Raw

274 KiB Raw Blame History

274 KiB

Raw Blame History