ik_llama.cpp/llama_util.h at d00422bd62b261eaa02f51908f243e5cd145f126

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-28 10:21:48 +00:00

Files

slaren 796c107b37 cuBLAS: use host pinned memory and dequantize while copying (#1207 )

* cuBLAS: dequantize simultaneously while copying memory

* cuBLAS: use host pinned memory

* cuBLAS: improve ggml_compute_forward_mul_mat_f16_f32 with pinned memory

* cuBLAS: also pin kv cache

* fix rebase

2023-04-29 02:04:18 +02:00

12 KiB

Executable File

Raw Blame History

View Raw

12 KiB Executable File Raw Blame History

12 KiB

Executable File

Raw Blame History