Files
ik_llama.cpp/github-data/pull_requests/56 - BF16 support on Metal.md
2025-07-23 13:31:53 +02:00

1.2 KiB

🔀 #56 - BF16 support on Metal

Author ikawrakow
State Closed
Created 2024-09-16
Updated 2024-09-17

Description

It is slightly slower than fp16, but definitely a massive improvement compared to not having bf16 support at al. Didn't put any effort into optimizing the matrix x vector kernel, so it is likely one can improve bf16 TG performance.

model size params backend ngl test t/s
llama 8B BF16 14.96 GiB 8.03 B Metal 100 pp512 538.84 ± 0.26
llama 8B F16 14.96 GiB 8.03 B Metal 100 pp512 587.26 ± 0.39
llama 8B BF16 14.96 GiB 8.03 B Metal 100 tg128 21.64 ± 0.05
llama 8B F16 14.96 GiB 8.03 B Metal 100 tg128 21.77 ± 0.03