Embeddings: Switch to Infinity

Infinity-emb is an async batching engine for embeddings. This is
preferable to sentence-transformers since it handles scalable usecases
without the need for external thread intervention.

Signed-off-by: kingbri <bdashore3@proton.me>
This commit is contained in:
kingbri
2024-07-29 13:42:03 -04:00
parent c9a5d2c363
commit 3f21d9ef96
4 changed files with 87 additions and 100 deletions

View File

@@ -135,6 +135,6 @@ async def chat_completion_request(
dependencies=[Depends(check_api_key), Depends(check_model_container)],
)
async def handle_embeddings(data: EmbeddingsRequest) -> EmbeddingsResponse:
response = await embeddings(data.input, data.encoding_format, data.model)
response = await embeddings(data)
return response