Adds DRY support based on the current exl2 dev API. Only change for
optimization is dry_max_ngram instead of using a closed range.
Currently, DRY range is aliased to dry_max_ngram.
Signed-off-by: kingbri <bdashore3@proton.me>
The config categories can have defined separation, but preserve
the dynamic nature of adding new config options by making all the
internal class vars as dictionaries.
This was necessary since storing global callbacks stored a state
of the previous global_config var that wasn't populated.
Signed-off-by: kingbri <bdashore3@proton.me>
If a user requesting a model change isn't admin, error.
Better to place the load function before the generate functions.
Signed-off-by: kingbri <bdashore3@proton.me>
These have to be merged beforehand and the updated version needs to be
re-fetched. It's possible to prevent the fetch of draft_args in the
beginning of init.
Signed-off-by: kingbri <bdashore3@proton.me>
Using "auto" for rope alpha removes ambiguity on how to explicitly
enable automatic rope calculation. The same behavior of None -> auto
calculate still exists, but can be overwritten if a model's tabby_config.yml
includes `rope_alpha`.
Signed-off-by: kingbri <bdashore3@proton.me>
It's best to pass them down the config stack.
API/User config.yml -> model config.yml -> model config.json -> fallback.
Doing this allows for seamless flow and yielding control to each
member in the stack.
Signed-off-by: kingbri <bdashore3@proton.me>
Like config.json in a model folder, providing a tabby_config.yml
will serve as a layer between user provided kwargs and the config.json
values.
Signed-off-by: kingbri <bdashore3@proton.me>
Storing a pathlib type makes it easier to manipulate the model
directory path in the long run without constantly fetching it
from the config.
Signed-off-by: kingbri <bdashore3@proton.me>
* Add healthcheck
- localhost only /healthcheck endpoint
- cURL healthcheck in docker compose file
* Update Healthcheck Response
- change endpoint to /health
- remove localhost restriction
- add docstring
* move healthcheck definition to top of the file
- make the healthcheck show up first in the openAPI spec
* Tree: Format
Use the tensor parallel loader when the flag is enabled. The new loader
has its own autosplit implementation, so gpu_split_auto isn't valid
here.
Also make it easier to determine which cache type to use rather than
multiple if/else statements.
Signed-off-by: kingbri <bdashore3@proton.me>
At least one bind mount is required in the volumes YAML block otherwise
the docker build fails. Models should be fine to default since it always
exists.
Signed-off-by: kingbri <bdashore3@proton.me>