Conor Nash
8bd7e0568f
Get Flux working on Apple Silicon ( #1264 )
...
Co-authored-by: Conor Nash <conor@nbs.consulting >
2024-09-13 15:40:11 +01:00
Panchovix
c13b26ba27
Rephrase low GPU warning ( #1761 )
...
Make emphasis that it's performance degradation, and for this diffusion process.
2024-09-09 14:30:22 -03:00
layerdiffusion
efe6fed499
add a way to exchange variables between modules
2024-09-08 20:22:04 -07:00
layerdiffusion
f40930c55b
fix
2024-09-08 17:24:53 -07:00
layerdiffusion
44eb4ea837
Support T5&Clip Text Encoder LoRA from OneTrainer
...
requested by #1727
and some cleanups/licenses
PS: LoRA request must give download URL to at least one LoRA
2024-09-08 01:39:29 -07:00
layerdiffusion
a8a81d3d77
fix offline quant lora precision
2024-08-31 13:12:23 -07:00
layerdiffusion
79b25a8235
move codes
2024-08-31 11:31:02 -07:00
layerdiffusion
33963f2d19
always compute on-the-fly lora weights when offload
2024-08-31 11:24:23 -07:00
layerdiffusion
70a555906a
use safer codes
2024-08-31 10:55:19 -07:00
layerdiffusion
1f91b35a43
add signal_empty_cache
2024-08-31 10:20:22 -07:00
layerdiffusion
ec7917bd16
fix
2024-08-30 15:37:15 -07:00
layerdiffusion
d1d0ec46aa
Maintain patching related
...
1. fix several problems related to layerdiffuse not unloaded
2. fix several problems related to Fooocus inpaint
3. Slightly speed up on-the-fly LoRAs by precomputing them to computation dtype
2024-08-30 15:18:21 -07:00
layerdiffusion
f04666b19b
Attempt #1575
2024-08-30 09:41:36 -07:00
layerdiffusion
4c9380c46a
Speed up quant model loading and inference ...
...
... based on 3 evidences:
1. torch.Tensor.view on one big tensor is slightly faster than calling torch.Tensor.to on multiple small tensors.
2. but torch.Tensor.to with dtype change is significantly slower than torch.Tensor.view
3. “baking” model on GPU is significantly faster than computing on CPU when model load.
mainly influence inference of Q8_0, Q4_0/1/K and loading of all quants
2024-08-30 00:49:05 -07:00
layerdiffusion
3d62fa9598
reduce prints
2024-08-29 20:17:32 -07:00
layerdiffusion
95e16f7204
maintain loading related
...
1. revise model moving orders
2. less verbose printing
3. some misc minor speedups
4. some bnb related maintain
2024-08-29 19:05:48 -07:00
layerdiffusion
d339600181
fix
2024-08-28 09:56:18 -07:00
layerdiffusion
81d8f55bca
support pytorch 2.4 new normalization features
2024-08-28 09:08:26 -07:00
layerdiffusion
0abb6c4686
Second Attempt for #1502
2024-08-28 08:08:40 -07:00
layerdiffusion
f22b80ef94
restrict baking to 16bits
2024-08-26 06:16:13 -07:00
layerdiffusion
388b70134b
fix offline loras
2024-08-25 20:28:40 -07:00
layerdiffusion
b25b62da96
fix T5 not baked
2024-08-25 17:31:50 -07:00
layerdiffusion
cae37a2725
fix dequant of unbaked parameters
2024-08-25 17:24:31 -07:00
layerdiffusion
13d6f8ed90
revise GGUF by precomputing some parameters
...
rather than computing them in each diffusion iteration
2024-08-25 14:30:09 -07:00
lllyasviel
f82029c5cf
support more t5 quants ( #1482 )
...
lets hope this is the last time that people randomly invent new state dict key formats
2024-08-24 12:47:49 -07:00
layerdiffusion
f23ee63cb3
always set empty cache signal as long as any patch happens
2024-08-23 08:56:57 -07:00
layerdiffusion
2ab19f7f1c
revise lora patching
2024-08-22 11:59:43 -07:00
layerdiffusion
68bf7f85aa
speed up nf4 lora in offline patching mode
2024-08-22 10:35:11 -07:00
layerdiffusion
95d04e5c8f
fix
2024-08-22 10:08:21 -07:00
layerdiffusion
14eac6f2cf
add a way to empty cuda cache on the fly
2024-08-22 10:06:39 -07:00
layerdiffusion
909ad6c734
fix prints
2024-08-21 22:24:54 -07:00
layerdiffusion
0d8eb4c5ba
fix #1375
2024-08-21 11:01:59 -07:00
layerdiffusion
4e3c78178a
[revised] change some dtype behaviors based on community feedbacks
...
only influence old devices like 1080/70/60/50.
please remove cmd flags if you are on 1080/70/60/50 and previously used many cmd flags to tune performance
2024-08-21 10:23:38 -07:00
layerdiffusion
1419ef29aa
Revert "change some dtype behaviors based on community feedbacks"
...
This reverts commit 31bed671ac .
2024-08-21 10:10:49 -07:00
layerdiffusion
31bed671ac
change some dtype behaviors based on community feedbacks
...
only influence old devices like 1080/70/60/50.
please remove cmd flags if you are on 1080/70/60/50 and previously used many cmd flags to tune performance
2024-08-21 08:46:52 -07:00
layerdiffusion
1096c708cc
revise swap module name
2024-08-20 21:18:53 -07:00
layerdiffusion
5452bc6ac3
All Forge Spaces Now Pass 4GB VRAM
...
and they all 100% reproduce author results
2024-08-20 08:01:10 -07:00
layerdiffusion
6f411a4940
fix loras on nf4 models when activate "loras in fp16"
2024-08-20 01:29:52 -07:00
layerdiffusion
475524496d
revise
2024-08-19 18:54:54 -07:00
layerdiffusion
d7151b4dcd
add low vram warning
2024-08-19 11:08:01 -07:00
layerdiffusion
2f1d04759f
avoid some mysteries problems when using lots of python local delegations
2024-08-19 09:47:04 -07:00
layerdiffusion
96f264ec6a
add a way to save models
2024-08-19 06:30:49 -07:00
layerdiffusion
d03fc5c2b1
speed up a bit
2024-08-19 05:06:46 -07:00
layerdiffusion
d38e560e42
Implement some rethinking about LoRA system
...
1. Add an option to allow users to use UNet in fp8/gguf but lora in fp16.
2. All FP16 loras do not need patch. Others will only patch again when lora weight change.
3. FP8 unet + fp16 lora are available (somewhat only available) in Forge now. This also solves some “LoRA too subtle” problems.
4. Significantly speed up all gguf models (in Async mode) by using independent thread (CUDA stream) to compute and dequant at the same time, even when low-bit weights are already on GPU.
5. View “online lora” as a module similar to ControlLoRA so that it is moved to GPU together with model when sampling, achieving significant speedup and perfect low VRAM management simultaneously.
2024-08-19 04:31:59 -07:00
layerdiffusion
e5f213c21e
upload some GGUF supports
2024-08-19 01:09:50 -07:00
layerdiffusion
53cd00d125
revise
2024-08-17 23:03:50 -07:00
layerdiffusion
db5a876d4c
completely solve all LoRA OOMs
2024-08-17 22:43:20 -07:00
layerdiffusion
8a04293430
fix some gguf loras
2024-08-17 01:15:37 -07:00
layerdiffusion
ab4b0d5b58
fix some mem leak
2024-08-17 00:19:43 -07:00
layerdiffusion
3da7de418a
fix layerdiffuse
2024-08-16 21:37:25 -07:00