ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-26 08:04:09 +00:00

Files

Iwan Kawrakow c4951cbc35 Softcap: WIP

Fuses scale + tanh + scale as used for softcaping in some
models.

Just CPU for now. ~1.4% for PP-512 on Gemma2-9b, no effect on TG.

Somewhat surprisingly the improvement does not increase as I
go to longer contexts. Gemma2 does softcap on K*Q, which grows
quadratically with context length, so I would have thought
the benefit from fusing scale, tanh, scale would increase.
But no, no luck.

2024-08-19 17:40:01 +03:00

ggml-alloc.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-backend.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-blas.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-cann.h

Merge mainline llama.cpp (#3 )