ik_llama.cpp/ggml/src at 465569dff8b49a195450a0eb1974fd72a32fcebc - ik_llama.cpp - Public git mirror

ikawrakow/ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-05 19:40:19 +00:00

Files

History

Kawrakow 465569dff8 Faster DeepSeek FA on CUDA (#408 )

* New DeepSeek FlashMLA

Does not work because the RoPE portion is stored at the end
in our case, while in mainline it is stored at the beginning,
and the FA kernel assumes that.

* Rearrange MLA K cache so it first new CUDA FA implementation

* constexpr and minor changes

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2025-05-12 07:49:00 +03:00

..

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

Faster DeepSeek FA on CUDA (#408 )

2025-05-12 07:49:00 +03:00

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

Fix DeepSeek q8_0 cache (#391 )

2025-05-07 12:06:49 +03:00

kompute @ 4565194ed7

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

kompute-shaders

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

CMakeLists.txt

Faster DeepSeek FA on CUDA (#408 )

2025-05-12 07:49:00 +03:00

ggml-aarch64.c

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-aarch64.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-alloc.c

Fix ARM_NEON build failure due to q8_2 (#303 )

2025-04-01 13:48:20 +02:00

ggml-backend-impl.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-backend.c

GPU offload policy (#405 )

2025-05-12 07:47:46 +03:00

ggml-blas.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-cann.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-common.h

Add copyright notices (#317 )

2025-04-07 10:43:26 +02:00

ggml-cuda.cu

GPU offload policy (#405 )

2025-05-12 07:47:46 +03:00

ggml-impl.h

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-kompute.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-metal.m

Metal: FA and FlashMLA (#310 )

2025-04-03 17:54:25 +02:00

ggml-metal.metal

Metal: FA and FlashMLA (#310 )

2025-04-03 17:54:25 +02:00

ggml-quants.c

Improved IQ1_M quantization (#327 )

2025-04-13 10:37:55 +02:00

ggml-quants.h

IQ1_M_R4: better 1.75 bpw quants (#187 )

2025-02-06 14:08:52 +02:00

ggml-rpc.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-sycl.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml-vulkan.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

ggml.c

TG improvements for MoE models (#404 )

2025-05-10 18:52:54 +03:00