mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-03-06 12:00:29 +00:00
WIP KQ binary mask: make it a parameter, turn on via command line
It is a pain to implement binary mask to 32-bit value conversion on NEON and AVX2, so I decided to make the binary mask optional There is also a commented out (and not working) attempt for NEON in this commit.
This commit is contained in:
@@ -340,6 +340,7 @@ extern "C" {
|
||||
bool embeddings; // if true, extract embeddings (together with logits)
|
||||
bool offload_kqv; // whether to offload the KQV ops (including the KV cache) to GPU
|
||||
bool flash_attn; // whether to use flash attention [EXPERIMENTAL]
|
||||
bool binary_kq; // whether to use binary KQ mask [EXPERIMENTAL]
|
||||
|
||||
// Abort callback
|
||||
// if it returns true, execution of llama_decode() will be aborted
|
||||
|
||||
Reference in New Issue
Block a user