mirror of
https://github.com/lllyasviel/stable-diffusion-webui-forge.git
synced 2026-04-27 09:41:31 +00:00
by precomputing all possible 4bit dequant into a lookup table and use pytorch indexing to get dequant, rather than really computing the bit operations. This should give very similar performance to native CUDA kernels, while being LoRA friendly and more flexiable
This is Forge's implementation of GGUF - the difference is that it supports pytorch quant/dequant Codes are based on LLama.cpp's GGUF - the difference is that it supports quant