mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-02-24 15:14:10 +00:00
Zen4 Flash Attention - bf16 support (#38)
* Zen4 Flash Attnetion: WIP bf16 * Zen4 Flash Attnetion: bf16 seems to be working * Zen4 Flash Attnetion: improving bf16 * Zen4 Flash Attnetion: improving bf16 It is better (slightly faster) to first convert Q to bf16 before processing each block of q_step rows. This requires D*q_step*sizeof(bf16) bytes, so at most 4 kb for the head sizes we support, so we can just allocate on the stack instead of reserving and passing a work buffer in ggml. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
@@ -2221,6 +2221,9 @@ static ggml_type kv_cache_type_from_str(const std::string & s) {
|
||||
if (s == "f16") {
|
||||
return GGML_TYPE_F16;
|
||||
}
|
||||
if (s == "bf16") {
|
||||
return GGML_TYPE_BF16;
|
||||
}
|
||||
if (s == "q8_0") {
|
||||
return GGML_TYPE_Q8_0;
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user