Use embeddings_device as the parameter for device to remove ambiguity. Signed-off-by: kingbri <bdashore3@proton.me>
Same as the normal model container. Signed-off-by: kingbri <bdashore3@proton.me>
Run unload async functions before exiting the program. Signed-off-by: kingbri <bdashore3@proton.me>
Use Infinity as a separate backend and handle the model within the common module. This separates out the embeddings model from the endpoint which allows for model loading/unloading in core. Signed-off-by: kingbri <bdashore3@proton.me>