Two changes:
1. Add four missing environment variable bindings to
gpt_params_parse_from_env():
- LLAMA_ARG_CACHE_TYPE_K (string, e.g. "q8_0")
- LLAMA_ARG_CACHE_TYPE_V (string, e.g. "q8_0")
- LLAMA_ARG_MLOCK (bool, "1"/"true")
- LLAMA_ARG_K_CACHE_HADAMARD (bool, "1"/"true")
2. Call gpt_params_parse_from_env() from gpt_params_parse() so that
ALL tools (llama-cli, llama-bench, etc.) respect env vars, not
just llama-server. Env vars act as defaults; CLI flags override.
Follows the existing get_env() pattern and uses the same
LLAMA_ARG_ prefix convention as the other env vars.
Co-authored-by: Pipboyguy <>