Commit Graph

180 Commits

Author SHA1 Message Date
layerdiffusion
68bf7f85aa speed up nf4 lora in offline patching mode 2024-08-22 10:35:11 -07:00
layerdiffusion
95d04e5c8f fix 2024-08-22 10:08:21 -07:00
layerdiffusion
14eac6f2cf add a way to empty cuda cache on the fly 2024-08-22 10:06:39 -07:00
layerdiffusion
909ad6c734 fix prints 2024-08-21 22:24:54 -07:00
layerdiffusion
0d8eb4c5ba fix #1375 2024-08-21 11:01:59 -07:00
layerdiffusion
4e3c78178a [revised] change some dtype behaviors based on community feedbacks
only influence old devices like 1080/70/60/50.
please remove cmd flags if you are on 1080/70/60/50 and previously used many cmd flags to tune performance
2024-08-21 10:23:38 -07:00
layerdiffusion
1419ef29aa Revert "change some dtype behaviors based on community feedbacks"
This reverts commit 31bed671ac.
2024-08-21 10:10:49 -07:00
layerdiffusion
31bed671ac change some dtype behaviors based on community feedbacks
only influence old devices like 1080/70/60/50.
please remove cmd flags if you are on 1080/70/60/50 and previously used many cmd flags to tune performance
2024-08-21 08:46:52 -07:00
layerdiffusion
1096c708cc revise swap module name 2024-08-20 21:18:53 -07:00
layerdiffusion
5452bc6ac3 All Forge Spaces Now Pass 4GB VRAM
and they all 100% reproduce author results
2024-08-20 08:01:10 -07:00
layerdiffusion
6f411a4940 fix loras on nf4 models when activate "loras in fp16" 2024-08-20 01:29:52 -07:00
layerdiffusion
475524496d revise 2024-08-19 18:54:54 -07:00
layerdiffusion
d7151b4dcd add low vram warning 2024-08-19 11:08:01 -07:00
layerdiffusion
2f1d04759f avoid some mysteries problems when using lots of python local delegations 2024-08-19 09:47:04 -07:00
layerdiffusion
96f264ec6a add a way to save models 2024-08-19 06:30:49 -07:00
layerdiffusion
d03fc5c2b1 speed up a bit 2024-08-19 05:06:46 -07:00
layerdiffusion
d38e560e42 Implement some rethinking about LoRA system
1. Add an option to allow users to use UNet in fp8/gguf but lora in fp16.
2. All FP16 loras do not need patch. Others will only patch again when lora weight change.
3. FP8 unet + fp16 lora are available (somewhat only available) in Forge now. This also solves some “LoRA too subtle” problems.
4. Significantly speed up all gguf models (in Async mode) by using independent thread (CUDA stream) to compute and dequant at the same time, even when low-bit weights are already on GPU.
5. View “online lora” as a module similar to ControlLoRA so that it is moved to GPU together with model when sampling, achieving significant speedup and perfect low VRAM management simultaneously.
2024-08-19 04:31:59 -07:00
layerdiffusion
e5f213c21e upload some GGUF supports 2024-08-19 01:09:50 -07:00
layerdiffusion
53cd00d125 revise 2024-08-17 23:03:50 -07:00
layerdiffusion
db5a876d4c completely solve all LoRA OOMs 2024-08-17 22:43:20 -07:00
layerdiffusion
8a04293430 fix some gguf loras 2024-08-17 01:15:37 -07:00
layerdiffusion
ab4b0d5b58 fix some mem leak 2024-08-17 00:19:43 -07:00
layerdiffusion
3da7de418a fix layerdiffuse 2024-08-16 21:37:25 -07:00
layerdiffusion
9973d5dc09 better prints 2024-08-16 21:13:09 -07:00
layerdiffusion
f3e211d431 fix bnb lora 2024-08-16 21:09:14 -07:00
layerdiffusion
2f0555f7dc GPU Shared Async Swap for all GGUF/BNB 2024-08-16 08:45:17 -07:00
layerdiffusion
04e7f05769 speedup swap/loading of all quant types 2024-08-16 08:30:11 -07:00
layerdiffusion
394da01959 simplify 2024-08-16 04:55:01 -07:00
layerdiffusion
e36487ffa5 tune 2024-08-16 04:49:25 -07:00
lllyasviel
6e6e5c2162 do some profile on 3090 2024-08-16 04:43:19 -07:00
layerdiffusion
7c0f78e424 reduce cast 2024-08-16 03:59:59 -07:00
layerdiffusion
12369669cf only load lora one time 2024-08-16 02:02:22 -07:00
layerdiffusion
243952f364 wip qx_1 loras 2024-08-15 17:07:41 -07:00
layerdiffusion
f510f51303 speedup lora patching 2024-08-15 06:51:52 -07:00
layerdiffusion
141cf81c23 sometimes it is not diffusion model 2024-08-15 06:36:59 -07:00
layerdiffusion
021428da26 fix nf4 lora gives pure noise on some devices 2024-08-15 06:35:15 -07:00
layerdiffusion
3d751eb69f move file 2024-08-15 05:46:35 -07:00
layerdiffusion
616b335fce move file 2024-08-15 05:45:55 -07:00
layerdiffusion
1bd6cf0e0c Support LoRAs for Q8/Q5/Q4 GGUF Models
what a crazy night of math
2024-08-15 05:34:46 -07:00
layerdiffusion
2690b654fd reimplement q8/q85/q4 and review and match official gguf 2024-08-15 02:41:15 -07:00
layerdiffusion
ce16d34d03 disable xformers for t5 2024-08-15 00:55:49 -07:00
layerdiffusion
d336597fa5 add note to lora
but loras for NF4 is done already!
2024-08-15 00:42:48 -07:00
layerdiffusion
7fcfb93090 ling 2024-08-15 00:39:12 -07:00
layerdiffusion
0524133714 ling 2024-08-15 00:33:21 -07:00
layerdiffusion
fb62214a32 rewrite some functions 2024-08-15 00:29:19 -07:00
layerdiffusion
c74f603ea2 remove super call 2024-08-15 00:23:31 -07:00
layerdiffusion
d0518b7249 make prints beautiful 2024-08-15 00:20:03 -07:00
layerdiffusion
d8b83a9501 gguf preview 2024-08-15 00:03:32 -07:00
layerdiffusion
59790f2cb4 simplify codes 2024-08-14 20:48:39 -07:00
layerdiffusion
4b66cf1126 fix possible OOM again 2024-08-14 20:45:58 -07:00