server: add /v1/responses support (#1184)

* server: add /v1/responses support * server: fix Responses API model fallback and SSE branching
2026-05-05 13:51:45 +00:00 · 2026-02-14 08:30:18 +01:00
parent 1cb7e1bf39
commit 102f77b7d3
10 changed files with 926 additions and 7 deletions
--- a/examples/server/tests/features/server.feature
+++ b/examples/server/tests/features/server.feature
@@ -71,6 +71,22 @@ Feature: llama.cpp server
      | codellama70b | You are a coding assistant. | Write the fibonacci function in c++. | 128        | (thanks\|happy\|bird\|Annabyear)+ | -1       | 64          | enabled          |           |


+  Scenario Outline: OAI Responses Compatibility
+    Given a model <model>
+    And   a system prompt <system_prompt>
+    And   a user prompt <user_prompt>
+    And   <max_tokens> max tokens to predict
+    And   streaming is <enable_streaming>
+    Given an OAI compatible responses request with no api error
+    Then  <n_predicted> tokens are predicted matching <re_content>
+    And   <n_prompt> prompt tokens are processed
+
+    Examples: Prompts
+      | model        | system_prompt               | user_prompt                          | max_tokens | re_content                        | n_prompt | n_predicted | enable_streaming |
+      | llama-2      | Book                        | What is the best book                | 8          | (Here\|what)+                     | 77       | 8           | disabled         |
+      | codellama70b | You are a coding assistant. | Write the fibonacci function in c++. | 128        | (thanks\|happy\|bird\|Annabyear)+ | -1       | 64          | enabled          |
+
+
  Scenario Outline: OAI Compatibility w/ response format
    Given a model test
    And   a system prompt test