mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-04-23 16:09:18 +00:00
SER - Smart Expert Reduction (#239)
* A better way to measure the cost of ggml_barrier * Smart expert selection * Add ser option to llama-bench --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
@@ -386,6 +386,8 @@ extern "C" {
|
||||
int mla_attn; // whether to use MLA attention [EXPERIMENTAL]
|
||||
int attn_max_batch; // maximum batch size for attention computations [EXPERIMENTAL]
|
||||
bool fused_moe_up_gate; // whether to use fused MoE up/down op [EXPERIMENTAL]
|
||||
int min_experts;
|
||||
float thresh_experts;
|
||||
|
||||
// Abort callback
|
||||
// if it returns true, execution of llama_decode() will be aborted
|
||||
|
||||
Reference in New Issue
Block a user