Commit Graph

17 Commits

Author SHA1 Message Date
layerdiffusion
4c9380c46a Speed up quant model loading and inference ...
... based on 3 evidences:
1. torch.Tensor.view on one big tensor is slightly faster than calling torch.Tensor.to on multiple small tensors.
2. but torch.Tensor.to with dtype change is significantly slower than torch.Tensor.view
3. “baking” model on GPU is significantly faster than computing on CPU when model load.

mainly influence inference of Q8_0, Q4_0/1/K and loading of all quants
2024-08-30 00:49:05 -07:00
layerdiffusion
f22b80ef94 restrict baking to 16bits 2024-08-26 06:16:13 -07:00
layerdiffusion
cae37a2725 fix dequant of unbaked parameters 2024-08-25 17:24:31 -07:00
layerdiffusion
13d6f8ed90 revise GGUF by precomputing some parameters
rather than computing them in each diffusion iteration
2024-08-25 14:30:09 -07:00
lllyasviel
f82029c5cf support more t5 quants (#1482)
lets hope this is the last time that people randomly invent new state dict key formats
2024-08-24 12:47:49 -07:00
layerdiffusion
d38e560e42 Implement some rethinking about LoRA system
1. Add an option to allow users to use UNet in fp8/gguf but lora in fp16.
2. All FP16 loras do not need patch. Others will only patch again when lora weight change.
3. FP8 unet + fp16 lora are available (somewhat only available) in Forge now. This also solves some “LoRA too subtle” problems.
4. Significantly speed up all gguf models (in Async mode) by using independent thread (CUDA stream) to compute and dequant at the same time, even when low-bit weights are already on GPU.
5. View “online lora” as a module similar to ControlLoRA so that it is moved to GPU together with model when sampling, achieving significant speedup and perfect low VRAM management simultaneously.
2024-08-19 04:31:59 -07:00
layerdiffusion
e5f213c21e upload some GGUF supports 2024-08-19 01:09:50 -07:00
layerdiffusion
8a04293430 fix some gguf loras 2024-08-17 01:15:37 -07:00
layerdiffusion
2f0555f7dc GPU Shared Async Swap for all GGUF/BNB 2024-08-16 08:45:17 -07:00
layerdiffusion
243952f364 wip qx_1 loras 2024-08-15 17:07:41 -07:00
layerdiffusion
616b335fce move file 2024-08-15 05:45:55 -07:00
layerdiffusion
1bd6cf0e0c Support LoRAs for Q8/Q5/Q4 GGUF Models
what a crazy night of math
2024-08-15 05:34:46 -07:00
layerdiffusion
2690b654fd reimplement q8/q85/q4 and review and match official gguf 2024-08-15 02:41:15 -07:00
layerdiffusion
7fcfb93090 ling 2024-08-15 00:39:12 -07:00
layerdiffusion
0524133714 ling 2024-08-15 00:33:21 -07:00
layerdiffusion
fb62214a32 rewrite some functions 2024-08-15 00:29:19 -07:00
layerdiffusion
d8b83a9501 gguf preview 2024-08-15 00:03:32 -07:00