mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 17:20:01 +00:00

Files

firecoperana 1cad1ec1cc Update grammar (#1023 )

* grammar : fix JSON Schema for string regex with top-level alt. (#9903)

Prior to this commit, using a JSON Schema containing a string
with `pattern` regular expression that uses top-level alternation
(e.g. `"pattern": "^A|B|C|D$"`) would result in invalid JSON
output from the constrained sampling grammar, because it
ended up creating a grammar rule like this for the string:

```
thing ::= "\"" "A" | "B" | "C" | "D" "\"" space
```

Note that this rule will only match a starting quote for the "A" case,
and will only match an ending quote for the "D" case,
so this rule will always produce invalid JSON when used for sampling
(that is, the JSON will always be lacking the starting quote,
the ending quote, or both).

This was fixed in a simple way by adding parentheses to the
generated rule (for all string pattern rules, to keep it simple),
such that the new generated rule looks like this (correct):

```
thing ::= "\"" ("A" | "B" | "C" | "D") "\"" space
```

* grammars : add English-only grammar (#10612)

* grammar : handle maxItems == 0 in JSON schema (#13117)

Co-authored-by: Richard Lyons <frob@cloudstaff.com>

* grammar-parser : fix possible null-deref (#9004)

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70680

Signed-off-by: David Korczynski <david@adalogics.com>

* llama : fix typo in llama-grammar.h [no ci] (#11816)

* * server: fix "--grammar-file" parameter (#12285)

* common : use std::string_view now that we target c++17 (#14319)

* json : support `enum` values within `allOf` (#15830)

* grammar : use int64_t to avoid int overflows in int schema to grammar conversion logic (#16626)

* grammar : support array references in json schema (#16792)

* grammar : support array references in json schema

* Update json-schema-to-grammar.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* grammar : improve regex when naming ref derived rules

* grammar : replace non-conformant definitions array with anyOf test case

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
# Conflicts:
#	tests/test-json-schema-to-grammar.cpp

* merge fix

* llama : minor grammar refactor (#10897)

* llama: fix error on bad grammar (#12628)

* grammar : fix integer overflow (#17381)

* Fix DoS / integer overflow

* Remove optional, use INT64_MAX instead as placeholder value (it's technically -1, so it fits :)

* White space

* Actually, since it's unsigned, use UINT64_MAX
# Conflicts:
#	src/llama-grammar.cpp

* grammar: fix regression caused by #17381 (#17412)

* grammar: fix regression caused by #17381

* more readable
# Conflicts:
#	src/llama-grammar.cpp

* Merge Fix

* Fix warnings

---------

Signed-off-by: David Korczynski <david@adalogics.com>
Co-authored-by: Joe Eli McIlvain <joe.eli.mac@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: frob <rick+github@frob.com.au>
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
Co-authored-by: DavidKorczynski <david@adalogics.com>
Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>
Co-authored-by: firecoperana <firecoperana>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Aldehir Rojas <hello@alde.dev>
Co-authored-by: Olivier Chafik <olivier.chafik@gmail.com>
Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2025-11-30 18:45:38 +01:00

arithmetic.gbnf

llama : add grammar-based sampling (#1773 )

2023-07-23 23:58:10 -04:00

c.gbnf

examples : add C grammar (#2357 )

2023-09-01 16:32:14 +03:00

chess.gbnf

llama : add grammar-based sampling (#1773 )

2023-07-23 23:58:10 -04:00

english.gbnf

Update grammar (#1023 )

2025-11-30 18:45:38 +01:00

japanese.gbnf

llama : add grammar-based sampling (#1773 )

2023-07-23 23:58:10 -04:00

json_arr.gbnf

json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866 )

2024-06-11 02:22:57 +01:00

json.gbnf

json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866 )

2024-06-11 02:22:57 +01:00

list.gbnf

llama : add grammar-based sampling (#1773 )

2023-07-23 23:58:10 -04:00

README.md

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

README.md

GBNF Guide

GBNF (GGML BNF) is a format for defining formal grammars to constrain model outputs in llama.cpp. For example, you can use it to force the model to generate valid JSON, or speak only in emojis. GBNF grammars are supported in various ways in examples/main and examples/server.

Background

Backus-Naur Form (BNF) is a notation for describing the syntax of formal languages like programming languages, file formats, and protocols. GBNF is an extension of BNF that primarily adds a few modern regex-like features.

Basics

In GBNF, we define production rules that specify how a non-terminal (rule name) can be replaced with sequences of terminals (characters, specifically Unicode code points) and other non-terminals. The basic format of a production rule is nonterminal ::= sequence....

Example

Before going deeper, let's look at some of the features demonstrated in grammars/chess.gbnf, a small chess notation grammar:

# `root` specifies the pattern for the overall output
root ::= (
    # it must start with the characters "1. " followed by a sequence
    # of characters that match the `move` rule, followed by a space, followed
    # by another move, and then a newline
    "1. " move " " move "\n"

    # it's followed by one or more subsequent moves, numbered with one or two digits
    ([1-9] [0-9]? ". " move " " move "\n")+
)

# `move` is an abstract representation, which can be a pawn, nonpawn, or castle.
# The `[+#]?` denotes the possibility of checking or mate signs after moves
move ::= (pawn | nonpawn | castle) [+#]?

pawn ::= ...
nonpawn ::= ...
castle ::= ...

Non-Terminals and Terminals

Non-terminal symbols (rule names) stand for a pattern of terminals and other non-terminals. They are required to be a dashed lowercase word, like move, castle, or check-mate.

Terminals are actual characters (code points). They can be specified as a sequence like "1" or "O-O" or as ranges like [1-9] or [NBKQR].

Characters and character ranges

Terminals support the full range of Unicode. Unicode characters can be specified directly in the grammar, for example hiragana ::= [ぁ-ゟ], or with escapes: 8-bit (\xXX), 16-bit (\uXXXX) or 32-bit (\UXXXXXXXX).

Character ranges can be negated with ^:

single-line ::= [^\n]+ "\n"`

Sequences and Alternatives

The order of symbols in a sequence matters. For example, in "1. " move " " move "\n", the "1. " must come before the first move, etc.

Alternatives, denoted by |, give different sequences that are acceptable. For example, in move ::= pawn | nonpawn | castle, move can be a pawn move, a nonpawn move, or a castle.

Parentheses () can be used to group sequences, which allows for embedding alternatives in a larger rule or applying repetition and optional symbols (below) to a sequence.

Repetition and Optional Symbols

* after a symbol or sequence means that it can be repeated zero or more times (equivalent to {0,}).
+ denotes that the symbol or sequence should appear one or more times (equivalent to {1,}).
? makes the preceding symbol or sequence optional (equivalent to {0,1}).
{m} repeats the precedent symbol or sequence exactly m times
{m,} repeats the precedent symbol or sequence at least m times
{m,n} repeats the precedent symbol or sequence at between m and n times (included)
{0,n} repeats the precedent symbol or sequence at most n times (included)

Comments and newlines

Comments can be specified with #:

# defines optional whitespace
ws ::= [ \t\n]+

Newlines are allowed between rules and between symbols or sequences nested inside parentheses. Additionally, a newline after an alternate marker | will continue the current rule, even outside of parentheses.

The root rule

In a full grammar, the root rule always defines the starting point of the grammar. In other words, it specifies what the entire output must match.

# a grammar for lists
root ::= ("- " item)+
item ::= [^\n]+ "\n"

Next steps

This guide provides a brief overview. Check out the GBNF files in this directory (grammars/) for examples of full grammars. You can try them out with:

./llama-cli -m <model> --grammar-file grammars/some-grammar.gbnf -p 'Some prompt'

llama.cpp can also convert JSON schemas to grammars either ahead of time or at each request, see below.

Troubleshooting

Grammars currently have performance gotchas (see https://github.com/ggerganov/llama.cpp/issues/4218).

Efficient optional repetitions

A common pattern is to allow repetitions of a pattern x up to N times.

While semantically correct, the syntax x? x? x?.... x? (with N repetitions) may result in extremely slow sampling. Instead, you can write x{0,N} (or (x (x (x ... (x)?...)?)?)? w/ N-deep nesting in earlier llama.cpp versions).

Using GBNF grammars

You can use GBNF grammars:

In llama-server's completion endpoints, passed as the grammar body field
In llama-cli, passed as the --grammar & --grammar-file flags
With llama-gbnf-validator tool, to test them against strings.

JSON Schemas → GBNF

llama.cpp supports converting a subset of https://json-schema.org/ to GBNF grammars:

In llama-server:
- For any completion endpoints, passed as the json_schema body field
- For the /chat/completions endpoint, passed inside the result_format body field (e.g. {"type", "json_object", "schema": {"items": {}}})
In llama-cli, passed as the --json / -j flag
To convert to a grammar ahead of time:
- in CLI, with examples/json_schema_to_grammar.py
- in JavaScript with json-schema-to-grammar.mjs (this is used by the server's Web UI)

Take a look at tests to see which features are likely supported (you'll also find usage examples in https://github.com/ggerganov/llama.cpp/pull/5978, https://github.com/ggerganov/llama.cpp/pull/6659 & https://github.com/ggerganov/llama.cpp/pull/6555).

llama-cli \
  -hfr bartowski/Phi-3-medium-128k-instruct-GGUF \
  -hff Phi-3-medium-128k-instruct-Q8_0.gguf \
  -j '{
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "name": {
                "type": "string",
                "minLength": 1,
                "maxLength": 100
            },
            "age": {
                "type": "integer",
                "minimum": 0,
                "maximum": 150
            }
        },
        "required": ["name", "age"],
        "additionalProperties": false
    },
    "minItems": 10,
    "maxItems": 100
  }' \
  -p 'Generate a {name, age}[] JSON array with famous actors of all ages.'

Show grammar

You can convert any schema in command-line with:

examples/json_schema_to_grammar.py name-age-schema.json

char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
item ::= "{" space item-name-kv "," space item-age-kv "}" space
item-age ::= ([0-9] | ([1-8] [0-9] | [9] [0-9]) | "1" ([0-4] [0-9] | [5] "0")) space
item-age-kv ::= "\"age\"" space ":" space item-age
item-name ::= "\"" char{1,100} "\"" space
item-name-kv ::= "\"name\"" space ":" space item-name
root ::= "[" space item ("," space item){9,99} "]" space
space ::= | " " | "\n" [ \t]{0,20}

Here is also a list of known limitations (contributions welcome):

additionalProperties defaults to false (produces faster grammars + reduces hallucinations).
"additionalProperties": true may produce keys that contain unescaped newlines.
Unsupported features are skipped silently. It is currently advised to use the command-line Python converter (see above) to see any warnings, and to inspect the resulting grammar / test it w/ llama-gbnf-validator.
Can't mix properties w/ anyOf / oneOf in the same type (https://github.com/ggerganov/llama.cpp/issues/7703)
prefixItems is broken (but items works)
minimum, exclusiveMinimum, maximum, exclusiveMaximum: only supported for "type": "integer" for now, not number
Nested $refs are broken (https://github.com/ggerganov/llama.cpp/issues/8073)
patterns must start with ^ and end with $
Remote $refs not supported in the C++ version (Python & JavaScript versions fetch https refs)
string formats lack uri, email
No patternProperties

And a non-exhaustive list of other unsupported features that are unlikely to be implemented (hard and/or too slow to support w/ stateless grammars):

uniqueItems
contains / minContains
$anchor (cf. dereferencing)
not
Conditionals if / then / else / dependentSchemas

A word about additionalProperties

Warning

The JSON schemas spec states objects accept additional properties by default. Since this is slow and seems prone to hallucinations, we default to no additional properties. You can set "additionalProperties": true in the the schema of any object to explicitly allow additional properties.

If you're using Pydantic to generate schemas, you can enable additional properties with the extra config on each model class:

# pip install pydantic
import json
from typing import Annotated, List
from pydantic import BaseModel, Extra, Field
class QAPair(BaseModel):
    class Config:
        extra = 'allow'  # triggers additionalProperties: true in the JSON schema
    question: str
    concise_answer: str
    justification: str

class Summary(BaseModel):
    class Config:
        extra = 'allow'
    key_facts: List[Annotated[str, Field(pattern='- .{5,}')]]
    question_answers: List[Annotated[List[QAPair], Field(min_items=5)]]

print(json.dumps(Summary.model_json_schema(), indent=2))

Show JSON schema & grammar

{
  "$defs": {
    "QAPair": {
      "additionalProperties": true,
      "properties": {
        "question": {
          "title": "Question",
          "type": "string"
        },
        "concise_answer": {
          "title": "Concise Answer",
          "type": "string"
        },
        "justification": {
          "title": "Justification",
          "type": "string"
        }
      },
      "required": [
        "question",
        "concise_answer",
        "justification"
      ],
      "title": "QAPair",
      "type": "object"
    }
  },
  "additionalProperties": true,
  "properties": {
    "key_facts": {
      "items": {
        "pattern": "^- .{5,}$",
        "type": "string"
      },
      "title": "Key Facts",
      "type": "array"
    },
    "question_answers": {
      "items": {
        "items": {
          "$ref": "#/$defs/QAPair"
        },
        "minItems": 5,
        "type": "array"
      },
      "title": "Question Answers",
      "type": "array"
    }
  },
  "required": [
    "key_facts",
    "question_answers"
  ],
  "title": "Summary",
  "type": "object"
}

QAPair ::= "{" space QAPair-question-kv "," space QAPair-concise-answer-kv "," space QAPair-justification-kv ( "," space ( QAPair-additional-kv ( "," space QAPair-additional-kv )* ) )? "}" space
QAPair-additional-k ::= ["] ( [c] ([o] ([n] ([c] ([i] ([s] ([e] ([_] ([a] ([n] ([s] ([w] ([e] ([r] char+ | [^"r] char*) | [^"e] char*) | [^"w] char*) | [^"s] char*) | [^"n] char*) | [^"a] char*) | [^"_] char*) | [^"e] char*) | [^"s] char*) | [^"i] char*) | [^"c] char*) | [^"n] char*) | [^"o] char*) | [j] ([u] ([s] ([t] ([i] ([f] ([i] ([c] ([a] ([t] ([i] ([o] ([n] char+ | [^"n] char*) | [^"o] char*) | [^"i] char*) | [^"t] char*) | [^"a] char*) | [^"c] char*) | [^"i] char*) | [^"f] char*) | [^"i] char*) | [^"t] char*) | [^"s] char*) | [^"u] char*) | [q] ([u] ([e] ([s] ([t] ([i] ([o] ([n] char+ | [^"n] char*) | [^"o] char*) | [^"i] char*) | [^"t] char*) | [^"s] char*) | [^"e] char*) | [^"u] char*) | [^"cjq] char* )? ["] space
QAPair-additional-kv ::= QAPair-additional-k ":" space value
QAPair-concise-answer-kv ::= "\"concise_answer\"" space ":" space string
QAPair-justification-kv ::= "\"justification\"" space ":" space string
QAPair-question-kv ::= "\"question\"" space ":" space string
additional-k ::= ["] ( [k] ([e] ([y] ([_] ([f] ([a] ([c] ([t] ([s] char+ | [^"s] char*) | [^"t] char*) | [^"c] char*) | [^"a] char*) | [^"f] char*) | [^"_] char*) | [^"y] char*) | [^"e] char*) | [q] ([u] ([e] ([s] ([t] ([i] ([o] ([n] ([_] ([a] ([n] ([s] ([w] ([e] ([r] ([s] char+ | [^"s] char*) | [^"r] char*) | [^"e] char*) | [^"w] char*) | [^"s] char*) | [^"n] char*) | [^"a] char*) | [^"_] char*) | [^"n] char*) | [^"o] char*) | [^"i] char*) | [^"t] char*) | [^"s] char*) | [^"e] char*) | [^"u] char*) | [^"kq] char* )? ["] space
additional-kv ::= additional-k ":" space value
array ::= "[" space ( value ("," space value)* )? "]" space
boolean ::= ("true" | "false") space
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
decimal-part ::= [0-9]{1,16}
dot ::= [^\x0A\x0D]
integral-part ::= [0] | [1-9] [0-9]{0,15}
key-facts ::= "[" space (key-facts-item ("," space key-facts-item)*)? "]" space
key-facts-item ::= "\"" "- " key-facts-item-1{5,} "\"" space
key-facts-item-1 ::= dot
key-facts-kv ::= "\"key_facts\"" space ":" space key-facts
null ::= "null" space
number ::= ("-"? integral-part) ("." decimal-part)? ([eE] [-+]? integral-part)? space
object ::= "{" space ( string ":" space value ("," space string ":" space value)* )? "}" space
question-answers ::= "[" space (question-answers-item ("," space question-answers-item)*)? "]" space
question-answers-item ::= "[" space question-answers-item-item ("," space question-answers-item-item){4,} "]" space
question-answers-item-item ::= QAPair
question-answers-kv ::= "\"question_answers\"" space ":" space question-answers
root ::= "{" space key-facts-kv "," space question-answers-kv ( "," space ( additional-kv ( "," space additional-kv )* ) )? "}" space
space ::= | " " | "\n" [ \t]{0,20}
string ::= "\"" char* "\"" space
value ::= object | array | string | number | boolean | null

If you're using Zod, you can make your objects to explicitly allow extra properties w/ nonstrict() / passthrough() (or explicitly no extra props w/ z.object(...).strict() or z.strictObject(...)) but note that zod-to-json-schema currently always sets "additionalProperties": false anyway.

import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';

const Foo = z.object({
  age: z.number().positive(),
  email: z.string().email(),
}).strict();

console.log(zodToJsonSchema(Foo));

Show JSON schema & grammar

{
  "type": "object",
  "properties": {
    "age": {
      "type": "number",
      "exclusiveMinimum": 0
    },
    "email": {
      "type": "string",
      "format": "email"
    }
  },
  "required": [
    "age",
    "email"
  ],
  "additionalProperties": false,
  "$schema": "http://json-schema.org/draft-07/schema#"
}

age-kv ::= "\"age\"" space ":" space number
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
decimal-part ::= [0-9]{1,16}
email-kv ::= "\"email\"" space ":" space string
integral-part ::= [0] | [1-9] [0-9]{0,15}
number ::= ("-"? integral-part) ("." decimal-part)? ([eE] [-+]? integral-part)? space
root ::= "{" space age-kv "," space email-kv "}" space
space ::= | " " | "\n" [ \t]{0,20}
string ::= "\"" char* "\"" space