Files
tabbyAPI/backends
kingbri 5055a98e41 Model: Wrap load in inference_mode
Some tensors were being taken out of inference mode during each
iteration of exllama's load_autosplit_gen. This causes errors since
autograd is off.

Therefore, make the shared load_gen_sync function have an overarching
inference_mode context to prevent forward issues. This should allow for
the generator to iterate across each thread call.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-21 18:06:50 -04:00
..