ik_llama.cpp/ggml.c at 796c107b37ca266c8a33eb63698b272da99f573f

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-27 18:01:45 +00:00

Files

slaren 796c107b37 cuBLAS: use host pinned memory and dequantize while copying (#1207 )

* cuBLAS: dequantize simultaneously while copying memory

* cuBLAS: use host pinned memory

* cuBLAS: improve ggml_compute_forward_mul_mat_f16_f32 with pinned memory

* cuBLAS: also pin kv cache

* fix rebase

2023-04-29 02:04:18 +02:00

408 KiB

Raw Blame History

View Raw

408 KiB Raw Blame History

408 KiB

Raw Blame History