API + Model: Add new parameters and clean up documentation

The example JSON fields were changed because of the new sampler
default strategy. Fix these by manually changing the values.

Also add support for fasttensors and expose generate_window to
the API. It's recommended to not adjust generate_window as it's
dynamically scaled based on max_seq_len by default.

Signed-off-by: kingbri <bdashore3@proton.me>
This commit is contained in:
kingbri
2024-01-25 00:11:30 -05:00
committed by Brian Dashore
parent 90fb41a77a
commit fc4570220c
4 changed files with 45 additions and 10 deletions

View File

@@ -97,6 +97,9 @@ model:
# WARNING: This flag disables Flash Attention! (a stopgap fix until it's fixed in upstream)
#use_cfg: False
# Enables fasttensors to possibly increase model loading speeds (default: False)
#fasttensors: true
# Options for draft models (speculative decoding). This will use more VRAM!
#draft:
# Overrides the directory to look for draft (default: models)