13 Commits

Author SHA1 Message Date
Panchovix
c13b26ba27 Rephrase low GPU warning (#1761)
Make emphasis that it's performance degradation, and for this diffusion process.
2024-09-09 14:30:22 -03:00
layerdiffusion
33963f2d19 always compute on-the-fly lora weights when offload 2024-08-31 11:24:23 -07:00
layerdiffusion
d1d0ec46aa Maintain patching related
1. fix several problems related to layerdiffuse not unloaded
2. fix several problems related to Fooocus inpaint
3. Slightly speed up on-the-fly LoRAs by precomputing them to computation dtype
2024-08-30 15:18:21 -07:00
layerdiffusion
2ab19f7f1c revise lora patching 2024-08-22 11:59:43 -07:00
layerdiffusion
14eac6f2cf add a way to empty cuda cache on the fly 2024-08-22 10:06:39 -07:00
layerdiffusion
0d8eb4c5ba fix #1375 2024-08-21 11:01:59 -07:00
layerdiffusion
475524496d revise 2024-08-19 18:54:54 -07:00
layerdiffusion
d7151b4dcd add low vram warning 2024-08-19 11:08:01 -07:00
layerdiffusion
d38e560e42 Implement some rethinking about LoRA system
1. Add an option to allow users to use UNet in fp8/gguf but lora in fp16.
2. All FP16 loras do not need patch. Others will only patch again when lora weight change.
3. FP8 unet + fp16 lora are available (somewhat only available) in Forge now. This also solves some “LoRA too subtle” problems.
4. Significantly speed up all gguf models (in Async mode) by using independent thread (CUDA stream) to compute and dequant at the same time, even when low-bit weights are already on GPU.
5. View “online lora” as a module similar to ControlLoRA so that it is moved to GPU together with model when sampling, achieving significant speedup and perfect low VRAM management simultaneously.
2024-08-19 04:31:59 -07:00
lllyasviel
a07c758658 intergrate k-diffusion 2024-08-07 15:05:42 -07:00
layerdiffusion
b7878058f9 improve backward combability #936 2024-08-06 01:06:24 -07:00
layerdiffusion
0863765173 rework sd1.5 and sdxl from scratch 2024-08-05 03:08:17 -07:00
layerdiffusion
bb5083f3c2 rework sample function 2024-08-03 13:27:23 -07:00