Files
ik_llama.cpp/ggml/src
Iwan Kawrakow 4941c043bb Improve gemv for bf16_r16
It is better to process one "row" at a time and to have
4 accumulators. I guess, this allows better interleving of
load and fmadd instructions. We get ~10% better performance
for 1 thread, and fully saturate memory bandwidth at 2 threads
with a ~3.5% better performance (4.4 vs 4.25 t/s for L3-8B).
2025-01-23 08:29:48 +02:00
..
2024-11-21 07:12:11 +01:00
2025-01-23 08:29:48 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-10-25 13:08:43 +02:00
2024-12-23 14:34:23 +01:00
2024-10-31 12:05:27 +01:00
2024-08-12 15:14:32 +02:00
2024-10-31 12:05:27 +01:00
2024-10-31 12:05:27 +01:00