mirror of
https://github.com/theroyallab/tabbyAPI.git
synced 2026-03-15 00:07:28 +00:00
Some tensors were being taken out of inference mode during each iteration of exllama's load_autosplit_gen. This causes errors since autograd is off. Therefore, make the shared load_gen_sync function have an overarching inference_mode context to prevent forward issues. This should allow for the generator to iterate across each thread call. Signed-off-by: kingbri <bdashore3@proton.me>