Files
ik_llama.cpp/ggml
Iwan Kawrakow ead4c1e180 POC per row scale: add CUDA TODOs
There are two places in ggml-cuda.cu left where it is assumed
that type_size * n_per_row / block_size is the way to compute
and handle row sizes. This does not affect simple usage,
but will lead to issues when tensors are split between GPUs.
2024-09-25 13:10:34 +03:00
..
2024-07-27 07:55:01 +02:00
2024-09-25 13:10:33 +03:00
2024-09-25 13:10:34 +03:00
2024-07-27 07:55:01 +02:00