stable-diffusion-webui-forge

mirror of https://github.com/lllyasviel/stable-diffusion-webui-forge.git synced 2026-03-10 23:49:48 +00:00

Author	SHA1	Message	Date
layerdiffusion	4c9380c46a	Speed up quant model loading and inference ... ... based on 3 evidences: 1. torch.Tensor.view on one big tensor is slightly faster than calling torch.Tensor.to on multiple small tensors. 2. but torch.Tensor.to with dtype change is significantly slower than torch.Tensor.view 3. “baking” model on GPU is significantly faster than computing on CPU when model load. mainly influence inference of Q8_0, Q4_0/1/K and loading of all quants	2024-08-30 00:49:05 -07:00
layerdiffusion	f22b80ef94	restrict baking to 16bits	2024-08-26 06:16:13 -07:00
layerdiffusion	cae37a2725	fix dequant of unbaked parameters	2024-08-25 17:24:31 -07:00
layerdiffusion	13d6f8ed90	revise GGUF by precomputing some parameters rather than computing them in each diffusion iteration	2024-08-25 14:30:09 -07:00
lllyasviel	f82029c5cf	support more t5 quants (#1482 ) lets hope this is the last time that people randomly invent new state dict key formats	2024-08-24 12:47:49 -07:00
layerdiffusion	d38e560e42	Implement some rethinking about LoRA system 1. Add an option to allow users to use UNet in fp8/gguf but lora in fp16. 2. All FP16 loras do not need patch. Others will only patch again when lora weight change. 3. FP8 unet + fp16 lora are available (somewhat only available) in Forge now. This also solves some “LoRA too subtle” problems. 4. Significantly speed up all gguf models (in Async mode) by using independent thread (CUDA stream) to compute and dequant at the same time, even when low-bit weights are already on GPU. 5. View “online lora” as a module similar to ControlLoRA so that it is moved to GPU together with model when sampling, achieving significant speedup and perfect low VRAM management simultaneously.	2024-08-19 04:31:59 -07:00
layerdiffusion	e5f213c21e	upload some GGUF supports	2024-08-19 01:09:50 -07:00
layerdiffusion	8a04293430	fix some gguf loras	2024-08-17 01:15:37 -07:00
layerdiffusion	2f0555f7dc	GPU Shared Async Swap for all GGUF/BNB	2024-08-16 08:45:17 -07:00
layerdiffusion	243952f364	wip qx_1 loras	2024-08-15 17:07:41 -07:00
layerdiffusion	616b335fce	move file	2024-08-15 05:45:55 -07:00
layerdiffusion	1bd6cf0e0c	Support LoRAs for Q8/Q5/Q4 GGUF Models what a crazy night of math	2024-08-15 05:34:46 -07:00
layerdiffusion	2690b654fd	reimplement q8/q85/q4 and review and match official gguf	2024-08-15 02:41:15 -07:00
layerdiffusion	7fcfb93090	ling	2024-08-15 00:39:12 -07:00
layerdiffusion	0524133714	ling	2024-08-15 00:33:21 -07:00
layerdiffusion	fb62214a32	rewrite some functions	2024-08-15 00:29:19 -07:00
layerdiffusion	d8b83a9501	gguf preview	2024-08-15 00:03:32 -07:00

17 Commits