ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-26 08:04:09 +00:00

Files

Iwan Kawrakow 595d2ae32d iq6_k: slightly better Zen4 iqk_mul_mat

We now arrive at pp-512 = 147 t/s for LLaMA-3.1-8B.
TG-128 is 9.5 t/s. This is better than last commit,
but still kind of slow compared to Q6_K.

My last commit message is wrong: also iq3_k needs a fix
for overflow.

2024-08-09 16:00:31 +02:00

ggml-cann

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-cuda

iq6_k: CUDA dot product

2024-08-09 16:00:31 +02:00

ggml-sycl

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

iqk

iq6_k: slightly better Zen4 iqk_mul_mat

2024-08-09 16:00:31 +02:00

kompute @ 4565194ed7

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

kompute-shaders

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

llamafile

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

vulkan-shaders

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Factor out iqk CUDA dot products

2024-08-01 09:38:06 +02:00

ggml-aarch64.c

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-aarch64.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-alloc.c

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-backend-impl.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-backend.c

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-blas.cpp

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-cann.cpp

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-common.h

iq6_k: CUDA dot product

2024-08-09 16:00:31 +02:00

ggml-cuda.cu

iq6_k: CUDA dequantize

2024-08-09 16:00:31 +02:00

ggml-impl.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-kompute.cpp

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-metal.m

Adding IQ2_TN for use with ternary models (#13 )

2024-08-07 07:56:09 +02:00

ggml-metal.metal

Adding IQ2_TN for use with ternary models (#13 )

2024-08-07 07:56:09 +02:00

ggml-quants.c

iq6_k: WIP (quantize/dequantize)

2024-08-09 16:00:31 +02:00

ggml-quants.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-rpc.cpp

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-sycl.cpp

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-vulkan.cpp

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml.c

iq6_k: WIP (nothing works)

2024-08-09 16:00:31 +02:00