* prevent (Managed by Forge) settings reverting to default
* save CLIP skip to config when changed, inconsistent / unpredictable saving behaviour otherwise
- `/sdapi/v1/options` GET now calls `get_config()` from **sysinfo** module, instead of from its own version of the function.
- Defined a new, flexible and more robust `set_config()` function in **sysinfo** module, which:
- obsoletes redundant code
- skips updating values that are unchanged
- has flexible args for both API and UI use
- `/sdapi/v1/options` POST and `override_settings` now use the new `set_config()` function. `set_config()` could possibly obsolete additional functions, but I'm not going to get into that just yet.
- Options for `forge_additional_modules` can now be provided either as the file path, or just the module name.
- Most importantly, `refresh_model_loading_parameters()` is now only called ONCE per request, and **only** if necessary.
- It is now much easier to call `shared.opts.save()` as needed
adds options for user-set defaults for Sampler and Scheduler to UI settings sd, xl, flux;
adds options for user-set defaults for GPU Weights to UI settings xl, flux;
necessitates change to .input event listener instead of .release for ui_forge_inference_memory, which may be more correct anyway.
Added Flux to lora types in extra networks UI, so user can set.
Loras versioned first by user-set type, if any. Falls back to heuristics - these are much more reliable than the removed old A1111 tests and in case of no match default to Unknown (always displayed).
Filtering is done based on UI setting. 'all' setting does not filter. Filters lora lists on change.
Removed unused 'lora_hide_unknown_for_versions' setting.
1. Add an option to allow users to use UNet in fp8/gguf but lora in fp16.
2. All FP16 loras do not need patch. Others will only patch again when lora weight change.
3. FP8 unet + fp16 lora are available (somewhat only available) in Forge now. This also solves some “LoRA too subtle” problems.
4. Significantly speed up all gguf models (in Async mode) by using independent thread (CUDA stream) to compute and dequant at the same time, even when low-bit weights are already on GPU.
5. View “online lora” as a module similar to ControlLoRA so that it is moved to GPU together with model when sampling, achieving significant speedup and perfect low VRAM management simultaneously.