ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-26 08:04:09 +00:00

Files

Iwan Kawrakow 31ed9b331e WIP: plugging into ggml_compute_forward_flash_attn_ext_f16

Now everything is done in iqk_flash_helper_2.
It is slower than no FA
at 2048 tokens we have 167 vs 176 t/s.
This is better than Georgi's FA (138 t/s), but...
At 8192 tokens we degrade to 93 t/s vs 134 t/s without.

2024-08-23 16:48:35 +03:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

Fused soft cap and SIMD-ified GeLU (#9 )

2024-08-20 17:15:47 +03:00

src

WIP: plugging into ggml_compute_forward_flash_attn_ext_f16

2024-08-23 16:48:35 +03:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00