ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-05 19:40:19 +00:00

Files

Kawrakow 6a56d5075d Faster prompt processing for IQ2_KS, IQ2_K, IQ2_K_R4 (#593 )

* cuda: faster MMQ for iq2_ks, iq2_k, iq2_k_r4

* Lookup is still beter for MMQ if we get 4 values at once

* Minor

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2025-07-08 19:44:48 +02:00

cmake

Merge vulkan code from mainline up to commit of 6/28/2025 (#563 )

2025-07-02 08:49:42 +02:00

ggml-cann

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-cuda

Faster prompt processing for IQ2_KS, IQ2_K, IQ2_K_R4 (#593 )

2025-07-08 19:44:48 +02:00

ggml-sycl

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

iqk

Adding IQ3_KS quants (#566 )

2025-07-02 09:27:47 +02:00

kompute @ 4565194ed7

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

kompute-shaders

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

llamafile

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

vulkan-shaders

Vulkan: flash attention for DeepSeek models (#584 )

2025-07-05 15:14:12 +02:00

CMakeLists.txt

Fix CMakeLists (#571 )

2025-07-02 16:11:56 +02:00

ggml-aarch64.c

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-aarch64.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-alloc.c

Merge vulkan code from mainline up to commit of 6/28/2025 (#563 )

2025-07-02 08:49:42 +02:00

ggml-backend-impl.h

Merge vulkan code from mainline up to commit of 6/28/2025 (#563 )

2025-07-02 08:49:42 +02:00

ggml-backend.c

Fix debug build failure with RPC off (#579 )

2025-07-03 15:26:28 +02:00

ggml-blas.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-cann.cpp

Merge vulkan code from mainline up to commit of 6/28/2025 (#563 )

2025-07-02 08:49:42 +02:00

ggml-common.h

Adding IQ3_KS quants (#566 )

2025-07-02 09:27:47 +02:00

ggml-cuda.cu

CUDA: small PP performance improvement for MoE models (#589 )

2025-07-07 07:23:12 +02:00

ggml-impl.h

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-kompute.cpp

Merge vulkan code from mainline up to commit of 6/28/2025 (#563 )

2025-07-02 08:49:42 +02:00

ggml-metal.m

Adding IQ3_KS quants (#566 )

2025-07-02 09:27:47 +02:00

ggml-metal.metal

Adding IQ3_KS quants (#566 )

2025-07-02 09:27:47 +02:00

ggml-quants.c

Adding IQ3_KS quants (#566 )

2025-07-02 09:27:47 +02:00

ggml-quants.h

IQ1_M_R4: better 1.75 bpw quants (#187 )

2025-02-06 14:08:52 +02:00

ggml-rpc.cpp

Merge vulkan code from mainline up to commit of 6/28/2025 (#563 )

2025-07-02 08:49:42 +02:00

ggml-sycl.cpp

Merge vulkan code from mainline up to commit of 6/28/2025 (#563 )

2025-07-02 08:49:42 +02:00

ggml-vulkan.cpp

Vulkan: flash attention for DeepSeek models (#584 )

2025-07-05 15:14:12 +02:00

ggml.c

Adding IQ3_KS quants (#566 )

2025-07-02 09:27:47 +02:00