stable-diffusion-webui-forge

mirror of https://github.com/lllyasviel/stable-diffusion-webui-forge.git synced 2026-02-09 09:29:57 +00:00

Author	SHA1	Message	Date
DenOfEquity	19a9a78c9b	fix/workaround for potential memory leak (#2315 ) unload old models, based on reference count <= 2 (in practise only noticed extra copies of JointTextEncoder, never KModel or IntegratedAutoencoderKL) #2281 #2308 and others	2024-11-14 22:05:54 +00:00
layerdiffusion	79b25a8235	move codes	2024-08-31 11:31:02 -07:00
layerdiffusion	33963f2d19	always compute on-the-fly lora weights when offload	2024-08-31 11:24:23 -07:00
layerdiffusion	1f91b35a43	add signal_empty_cache	2024-08-31 10:20:22 -07:00
layerdiffusion	d1d0ec46aa	Maintain patching related 1. fix several problems related to layerdiffuse not unloaded 2. fix several problems related to Fooocus inpaint 3. Slightly speed up on-the-fly LoRAs by precomputing them to computation dtype	2024-08-30 15:18:21 -07:00
layerdiffusion	f04666b19b	Attempt #1575	2024-08-30 09:41:36 -07:00
layerdiffusion	4c9380c46a	Speed up quant model loading and inference ... ... based on 3 evidences: 1. torch.Tensor.view on one big tensor is slightly faster than calling torch.Tensor.to on multiple small tensors. 2. but torch.Tensor.to with dtype change is significantly slower than torch.Tensor.view 3. “baking” model on GPU is significantly faster than computing on CPU when model load. mainly influence inference of Q8_0, Q4_0/1/K and loading of all quants	2024-08-30 00:49:05 -07:00
layerdiffusion	95e16f7204	maintain loading related 1. revise model moving orders 2. less verbose printing 3. some misc minor speedups 4. some bnb related maintain	2024-08-29 19:05:48 -07:00
layerdiffusion	d339600181	fix	2024-08-28 09:56:18 -07:00
layerdiffusion	0abb6c4686	Second Attempt for #1502	2024-08-28 08:08:40 -07:00
layerdiffusion	68bf7f85aa	speed up nf4 lora in offline patching mode	2024-08-22 10:35:11 -07:00
layerdiffusion	95d04e5c8f	fix	2024-08-22 10:08:21 -07:00
layerdiffusion	14eac6f2cf	add a way to empty cuda cache on the fly	2024-08-22 10:06:39 -07:00
layerdiffusion	909ad6c734	fix prints	2024-08-21 22:24:54 -07:00
layerdiffusion	4e3c78178a	[revised] change some dtype behaviors based on community feedbacks only influence old devices like 1080/70/60/50. please remove cmd flags if you are on 1080/70/60/50 and previously used many cmd flags to tune performance	2024-08-21 10:23:38 -07:00
layerdiffusion	1419ef29aa	Revert "change some dtype behaviors based on community feedbacks" This reverts commit `31bed671ac`.	2024-08-21 10:10:49 -07:00
layerdiffusion	31bed671ac	change some dtype behaviors based on community feedbacks only influence old devices like 1080/70/60/50. please remove cmd flags if you are on 1080/70/60/50 and previously used many cmd flags to tune performance	2024-08-21 08:46:52 -07:00
layerdiffusion	475524496d	revise	2024-08-19 18:54:54 -07:00
layerdiffusion	d7151b4dcd	add low vram warning	2024-08-19 11:08:01 -07:00
layerdiffusion	d38e560e42	Implement some rethinking about LoRA system 1. Add an option to allow users to use UNet in fp8/gguf but lora in fp16. 2. All FP16 loras do not need patch. Others will only patch again when lora weight change. 3. FP8 unet + fp16 lora are available (somewhat only available) in Forge now. This also solves some “LoRA too subtle” problems. 4. Significantly speed up all gguf models (in Async mode) by using independent thread (CUDA stream) to compute and dequant at the same time, even when low-bit weights are already on GPU. 5. View “online lora” as a module similar to ControlLoRA so that it is moved to GPU together with model when sampling, achieving significant speedup and perfect low VRAM management simultaneously.	2024-08-19 04:31:59 -07:00
layerdiffusion	ab4b0d5b58	fix some mem leak	2024-08-17 00:19:43 -07:00
layerdiffusion	394da01959	simplify	2024-08-16 04:55:01 -07:00
layerdiffusion	e36487ffa5	tune	2024-08-16 04:49:25 -07:00
lllyasviel	6e6e5c2162	do some profile on 3090	2024-08-16 04:43:19 -07:00
layerdiffusion	7c0f78e424	reduce cast	2024-08-16 03:59:59 -07:00
layerdiffusion	d8b83a9501	gguf preview	2024-08-15 00:03:32 -07:00
layerdiffusion	59790f2cb4	simplify codes	2024-08-14 20:48:39 -07:00
layerdiffusion	b31f81628f	Revert "simplify codes" This reverts commit `2cc5aa7a3e`.	2024-08-14 20:39:00 -07:00
layerdiffusion	2cc5aa7a3e	simplify codes	2024-08-14 20:35:28 -07:00
layerdiffusion	aff742b597	speed up lora using cuda profile	2024-08-14 19:09:35 -07:00
lllyasviel	61f83dd610	support all flux models	2024-08-13 05:42:17 -07:00
layerdiffusion	f6ef105cb3	fix wrong print	2024-08-12 03:58:58 -07:00
layerdiffusion	a16ca5d057	fix amd	2024-08-11 17:53:08 -07:00
lllyasviel	cfa5242a75	forge 2.0.0 see also discussions	2024-08-10 19:24:19 -07:00
layerdiffusion	6f254f3599	revise stream	2024-08-08 20:18:56 -07:00
layerdiffusion	60c5aea11b	revise stream logics	2024-08-08 18:45:36 -07:00
layerdiffusion	e1df7a1bae	revise kernel	2024-08-07 17:24:22 -07:00
layerdiffusion	b57573c8da	Implement many kernels from scratch	2024-08-06 20:19:03 -07:00
lllyasviel	71c94799d1	diffusion in fp8 landed	2024-08-06 16:47:39 -07:00
layerdiffusion	318219bc9d	move file	2024-08-02 03:37:20 -07:00
layerdiffusion	bc9977a305	UNet from Scratch Now backend rewrite is about 50% finished. Estimated finish is in 72 hours. After that, many newer features will land.	2024-08-01 21:19:41 -07:00

41 Commits