layerdiffusion
1096c708cc
revise swap module name
2024-08-20 21:18:53 -07:00
layerdiffusion
5452bc6ac3
All Forge Spaces Now Pass 4GB VRAM
...
and they all 100% reproduce author results
2024-08-20 08:01:10 -07:00
layerdiffusion
6f411a4940
fix loras on nf4 models when activate "loras in fp16"
2024-08-20 01:29:52 -07:00
layerdiffusion
475524496d
revise
2024-08-19 18:54:54 -07:00
layerdiffusion
d7151b4dcd
add low vram warning
2024-08-19 11:08:01 -07:00
layerdiffusion
2f1d04759f
avoid some mysteries problems when using lots of python local delegations
2024-08-19 09:47:04 -07:00
layerdiffusion
96f264ec6a
add a way to save models
2024-08-19 06:30:49 -07:00
layerdiffusion
d03fc5c2b1
speed up a bit
2024-08-19 05:06:46 -07:00
layerdiffusion
d38e560e42
Implement some rethinking about LoRA system
...
1. Add an option to allow users to use UNet in fp8/gguf but lora in fp16.
2. All FP16 loras do not need patch. Others will only patch again when lora weight change.
3. FP8 unet + fp16 lora are available (somewhat only available) in Forge now. This also solves some “LoRA too subtle” problems.
4. Significantly speed up all gguf models (in Async mode) by using independent thread (CUDA stream) to compute and dequant at the same time, even when low-bit weights are already on GPU.
5. View “online lora” as a module similar to ControlLoRA so that it is moved to GPU together with model when sampling, achieving significant speedup and perfect low VRAM management simultaneously.
2024-08-19 04:31:59 -07:00
layerdiffusion
e5f213c21e
upload some GGUF supports
2024-08-19 01:09:50 -07:00
layerdiffusion
53cd00d125
revise
2024-08-17 23:03:50 -07:00
layerdiffusion
db5a876d4c
completely solve all LoRA OOMs
2024-08-17 22:43:20 -07:00
layerdiffusion
8a04293430
fix some gguf loras
2024-08-17 01:15:37 -07:00
layerdiffusion
ab4b0d5b58
fix some mem leak
2024-08-17 00:19:43 -07:00
layerdiffusion
3da7de418a
fix layerdiffuse
2024-08-16 21:37:25 -07:00
layerdiffusion
9973d5dc09
better prints
2024-08-16 21:13:09 -07:00
layerdiffusion
f3e211d431
fix bnb lora
2024-08-16 21:09:14 -07:00
layerdiffusion
2f0555f7dc
GPU Shared Async Swap for all GGUF/BNB
2024-08-16 08:45:17 -07:00
layerdiffusion
04e7f05769
speedup swap/loading of all quant types
2024-08-16 08:30:11 -07:00
layerdiffusion
394da01959
simplify
2024-08-16 04:55:01 -07:00
layerdiffusion
e36487ffa5
tune
2024-08-16 04:49:25 -07:00
lllyasviel
6e6e5c2162
do some profile on 3090
2024-08-16 04:43:19 -07:00
layerdiffusion
7c0f78e424
reduce cast
2024-08-16 03:59:59 -07:00
layerdiffusion
12369669cf
only load lora one time
2024-08-16 02:02:22 -07:00
layerdiffusion
243952f364
wip qx_1 loras
2024-08-15 17:07:41 -07:00
layerdiffusion
f510f51303
speedup lora patching
2024-08-15 06:51:52 -07:00
layerdiffusion
141cf81c23
sometimes it is not diffusion model
2024-08-15 06:36:59 -07:00
layerdiffusion
021428da26
fix nf4 lora gives pure noise on some devices
2024-08-15 06:35:15 -07:00
layerdiffusion
3d751eb69f
move file
2024-08-15 05:46:35 -07:00
layerdiffusion
616b335fce
move file
2024-08-15 05:45:55 -07:00
layerdiffusion
1bd6cf0e0c
Support LoRAs for Q8/Q5/Q4 GGUF Models
...
what a crazy night of math
2024-08-15 05:34:46 -07:00
layerdiffusion
2690b654fd
reimplement q8/q85/q4 and review and match official gguf
2024-08-15 02:41:15 -07:00
layerdiffusion
ce16d34d03
disable xformers for t5
2024-08-15 00:55:49 -07:00
layerdiffusion
d336597fa5
add note to lora
...
but loras for NF4 is done already!
2024-08-15 00:42:48 -07:00
layerdiffusion
7fcfb93090
ling
2024-08-15 00:39:12 -07:00
layerdiffusion
0524133714
ling
2024-08-15 00:33:21 -07:00
layerdiffusion
fb62214a32
rewrite some functions
2024-08-15 00:29:19 -07:00
layerdiffusion
c74f603ea2
remove super call
2024-08-15 00:23:31 -07:00
layerdiffusion
d0518b7249
make prints beautiful
2024-08-15 00:20:03 -07:00
layerdiffusion
d8b83a9501
gguf preview
2024-08-15 00:03:32 -07:00
layerdiffusion
59790f2cb4
simplify codes
2024-08-14 20:48:39 -07:00
layerdiffusion
4b66cf1126
fix possible OOM again
2024-08-14 20:45:58 -07:00
layerdiffusion
a29875206f
Revert "simplify codes"
...
This reverts commit e7567efd4b .
2024-08-14 20:39:05 -07:00
layerdiffusion
b31f81628f
Revert "simplify codes"
...
This reverts commit 2cc5aa7a3e .
2024-08-14 20:39:00 -07:00
layerdiffusion
2cc5aa7a3e
simplify codes
2024-08-14 20:35:28 -07:00
layerdiffusion
e7567efd4b
simplify codes
2024-08-14 20:34:02 -07:00
layerdiffusion
bbd0d76b28
fix possible oom
2024-08-14 20:27:05 -07:00
layerdiffusion
cb889470ba
experimental LoRA support for NF4 Model
...
method may change later depending on result quality
2024-08-14 19:52:19 -07:00
layerdiffusion
70a5acd8ad
doc
2024-08-14 19:12:02 -07:00
layerdiffusion
aff742b597
speed up lora using cuda profile
2024-08-14 19:09:35 -07:00