ik_llama.cpp/ggml/src at 4941c043bb137cd59760f1ed1ca5ae69d3b9da64 - ik_llama.cpp - Public git mirror

ikawrakow/ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-06 03:50:08 +00:00

Files

History

Iwan Kawrakow 4941c043bb Improve gemv for bf16_r16

It is better to process one "row" at a time and to have
4 accumulators. I guess, this allows better interleving of
load and fmadd instructions. We get ~10% better performance
for 1 thread, and fully saturate memory bandwidth at 2 threads
with a ~3.5% better performance (4.4 vs 4.25 t/s for L3-8B).

2025-01-23 08:29:48 +02:00

..

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

MMQ for Q6_0 (#115 )

2024-11-21 07:12:11 +01:00

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

Improve gemv for bf16_r16

2025-01-23 08:29:48 +02:00

kompute @ 4565194ed7

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

kompute-shaders

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

CMakeLists.txt

Enable q6_0 for flash attention (#101 )

2024-10-22 11:34:49 +02:00

ggml-aarch64.c

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-aarch64.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-alloc.c

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-backend-impl.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-backend.c

Bitnet changes (#106 )

2024-10-25 13:08:43 +02:00

ggml-blas.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-cann.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-common.h

IQ3_S_R4 (#162 )

2024-12-23 14:34:23 +01:00

ggml-cuda.cu

Faster MoE inference (#112 )

2024-10-31 12:05:27 +01:00

ggml-impl.h

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-kompute.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-metal.m

Faster MoE inference (#112 )

2024-10-31 12:05:27 +01:00

ggml-metal.metal

Faster MoE inference (#112 )

2024-10-31 12:05:27 +01:00

ggml-quants.c

CPU Flash Attention improvements (#172 )

2025-01-15 18:19:22 +02:00

ggml-quants.h

iq2_bn_r4: fastest Bitnet CPU implementation on the planet (#124 )

2024-12-06 12:15:39 +01:00

ggml-rpc.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-sycl.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-vulkan.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml.c

More Flash Attention improvements (#173 )

2025-01-20 08:57:38 +02:00