mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-03-02 10:00:07 +00:00
This implements the ability to load, unload, and scale control vectors (representation engineering) mid-inference, following the existing task-queue pattern used by LoRA adapters. New Endpoints: - GET /control-vectors - POST /control-vectors/load - POST /control-vectors/unload - POST /control-vectors/apply (handles scaling) Technical Notes: - Centralizes vector aggregation logic to share implementation between load, unload, and apply tasks. - Vectors are applied globally to the model context. - Enforces dimension validation on load to safely reject incompatible vectors. Co-authored-by: Gapeleon <gapeleon@users.noreply.github.com>
85 KiB
85 KiB