This implements the ability to load, unload, and scale control vectors
(representation engineering) mid-inference, following the existing
task-queue pattern used by LoRA adapters.
New Endpoints:
- GET /control-vectors
- POST /control-vectors/load
- POST /control-vectors/unload
- POST /control-vectors/apply (handles scaling)
Technical Notes:
- Centralizes vector aggregation logic to share implementation between
load, unload, and apply tasks.
- Vectors are applied globally to the model context.
- Enforces dimension validation on load to safely reject incompatible
vectors.
Co-authored-by: Gapeleon <gapeleon@users.noreply.github.com>