server: add /v1/responses support (#1184)

* server: add /v1/responses support

* server: fix Responses API model fallback and SSE branching
This commit is contained in:
RodriMora
2026-02-14 08:30:18 +01:00
committed by GitHub
parent 1cb7e1bf39
commit 102f77b7d3
10 changed files with 926 additions and 7 deletions

View File

@@ -71,6 +71,22 @@ Feature: llama.cpp server
| codellama70b | You are a coding assistant. | Write the fibonacci function in c++. | 128 | (thanks\|happy\|bird\|Annabyear)+ | -1 | 64 | enabled | |
Scenario Outline: OAI Responses Compatibility
Given a model <model>
And a system prompt <system_prompt>
And a user prompt <user_prompt>
And <max_tokens> max tokens to predict
And streaming is <enable_streaming>
Given an OAI compatible responses request with no api error
Then <n_predicted> tokens are predicted matching <re_content>
And <n_prompt> prompt tokens are processed
Examples: Prompts
| model | system_prompt | user_prompt | max_tokens | re_content | n_prompt | n_predicted | enable_streaming |
| llama-2 | Book | What is the best book | 8 | (Here\|what)+ | 77 | 8 | disabled |
| codellama70b | You are a coding assistant. | Write the fibonacci function in c++. | 128 | (thanks\|happy\|bird\|Annabyear)+ | -1 | 64 | enabled |
Scenario Outline: OAI Compatibility w/ response format
Given a model test
And a system prompt test