ik_llama.cpp/ggml-cuda/common.cuh at 6fcd1331efbfbb89c8c96eba2321bb7b4d0c40e4

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-28 10:21:48 +00:00

Files

Johannes Gäßler 1f0dabda8d CUDA: use tensor cores for MMQ (#7676 )

* CUDA: int8 tensor cores for MMQ (legacy quants)

* fix out-of-bounds writes

* __builtin_assume -> GGML_CUDA_ASSUME

* fix writeback returning too early

2024-06-10 11:45:13 +02:00

28 KiB

Raw Blame History

View Raw

28 KiB Raw Blame History

28 KiB

Raw Blame History