ik_llama.cpp/src at 43e65a672a98d931998559785b58f1e980e87f54 - ik_llama.cpp - Public git mirror

ikawrakow/ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-25 23:54:10 +00:00

Files

History

Kawrakow 43e65a672a Faster IQ4_XS_R4 on Zen4 (#128 )

* Faster iq4_xs_r4 on Zen4

The trick is to simply prepare the Q8 block sums for
blocks of 32 as floats. This brings PP-512 up to 254.6 t/s
from 224 t/s.

* Fix broken matrix x vector product on Zen4

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2024-12-08 15:27:13 +01:00

..

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

MMQ for Q6_0 (#115 )

2024-11-21 07:12:11 +01:00

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

Faster IQ4_XS_R4 on Zen4 (#128 )

2024-12-08 15:27:13 +01:00

kompute @ 4565194ed7

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

kompute-shaders

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

CMakeLists.txt

Enable q6_0 for flash attention (#101 )

2024-10-22 11:34:49 +02:00

ggml-aarch64.c

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-aarch64.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-alloc.c

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-backend-impl.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-backend.c

Bitnet changes (#106 )

2024-10-25 13:08:43 +02:00

ggml-blas.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-cann.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-common.h

Rename iq4_nl_x4 to iq4_nl_r4 (#126 )

2024-12-08 09:34:42 +01:00

ggml-cuda.cu

Faster MoE inference (#112 )

2024-10-31 12:05:27 +01:00

ggml-impl.h

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-kompute.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-metal.m

Faster MoE inference (#112 )

2024-10-31 12:05:27 +01:00

ggml-metal.metal

Faster MoE inference (#112 )

2024-10-31 12:05:27 +01:00

ggml-quants.c

Rename iq4_nl_x4 to iq4_nl_r4 (#126 )

2024-12-08 09:34:42 +01:00

ggml-quants.h

iq2_bn_r4: fastest Bitnet CPU implementation on the planet (#124 )

2024-12-06 12:15:39 +01:00

ggml-rpc.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-sycl.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-vulkan.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml.c

Faster IQ4_XS_R4 on Zen4 (#128 )

2024-12-08 15:27:13 +01:00