tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-03-15 00:07:28 +00:00

Files

kingbri 5055a98e41 Model: Wrap load in inference_mode

Some tensors were being taken out of inference mode during each
iteration of exllama's load_autosplit_gen. This causes errors since
autograd is off.

Therefore, make the shared load_gen_sync function have an overarching
inference_mode context to prevent forward issues. This should allow for
the generator to iterate across each thread call.

Signed-off-by: kingbri <bdashore3@proton.me>

2024-03-21 18:06:50 -04:00

exllamav2

Model: Wrap load in inference_mode

2024-03-21 18:06:50 -04:00