ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-25 00:49:34 +00:00

Files

Iwan Kawrakow 7784c8928f Per row scales - CUDA

The only place left where there are unnecessary assumptions being made
is in the Flash Attention code. As we are not using any quants that
use per row scales for quantized KV cache, it should be OK for now.

2024-09-25 13:10:34 +03:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

Per row scales - CUDA

2024-09-25 13:10:34 +03:00

src

Per row scales - CUDA

2024-09-25 13:10:34 +03:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00