ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-26 08:04:09 +00:00

Files

Iwan Kawrakow 6e5d728040 soft_cap_max: initial CPU version of fused softcap + soft_max

With this vanilla CPU implementation I'm already getting a ~3% speedup
for Gemma-2-9b and a prompt of 8192 tokens.

2024-08-21 13:31:56 +03:00

ggml-alloc.h

2024-07-27 07:55:01 +02:00

ggml-backend.h

2024-07-27 07:55:01 +02:00

ggml-blas.h

2024-07-27 07:55:01 +02:00

ggml-cann.h

2024-07-27 07:55:01 +02:00

ggml-cuda.h

2024-08-12 15:14:32 +02:00

ggml-kompute.h

2024-07-27 07:55:01 +02:00

ggml-metal.h

2024-08-12 15:14:32 +02:00

ggml-rpc.h

2024-07-27 07:55:01 +02:00

ggml-sycl.h

2024-07-27 07:55:01 +02:00

ggml-vulkan.h

2024-07-27 07:55:01 +02:00

ggml.h

2024-08-21 13:31:56 +03:00