stable-diffusion-webui-forge

lllyasviel/stable-diffusion-webui-forge

Fork 0

mirror of https://github.com/lllyasviel/stable-diffusion-webui-forge.git synced 2026-03-07 14:09:47 +00:00

Commit Graph

Author	SHA1	Message	Date
layerdiffusion	acf99dd74e	fix old version of pytorch	2024-08-26 06:51:48 -07:00
layerdiffusion	82dfc2b15b	Significantly speed up Q4_0, Q4_1, Q4_K by precomputing all possible 4bit dequant into a lookup table and use pytorch indexing to get dequant, rather than really computing the bit operations. This should give very similar performance to native CUDA kernels, while being LoRA friendly and more flexiable	2024-08-25 16:49:33 -07:00

Author

SHA1

Message

Date

layerdiffusion

acf99dd74e

fix old version of pytorch

2024-08-26 06:51:48 -07:00

layerdiffusion

82dfc2b15b

Significantly speed up Q4_0, Q4_1, Q4_K

by precomputing all possible 4bit dequant into a lookup table and use pytorch indexing to get dequant, rather than really computing the bit operations.
This should give very similar performance to native CUDA kernels, while being LoRA friendly and more flexiable

2024-08-25 16:49:33 -07:00

2 Commits