stable-diffusion-webui-forge

mirror of https://github.com/lllyasviel/stable-diffusion-webui-forge.git synced 2026-04-27 09:41:31 +00:00

Files

layerdiffusion 82dfc2b15b Significantly speed up Q4_0, Q4_1, Q4_K

by precomputing all possible 4bit dequant into a lookup table and use pytorch indexing to get dequant, rather than really computing the bit operations.
This should give very similar performance to native CUDA kernels, while being LoRA friendly and more flexiable

2024-08-25 16:49:33 -07:00

__init__.py

integrate llama3's GGUF

2024-08-15 01:45:29 -07:00

constants.py

integrate llama3's GGUF

2024-08-15 01:45:29 -07:00

gguf_reader.py

integrate llama3's GGUF

2024-08-15 01:45:29 -07:00

gguf_writer.py

integrate llama3's GGUF

2024-08-15 01:45:29 -07:00

lazy.py

integrate llama3's GGUF

2024-08-15 01:45:29 -07:00

metadata.py

integrate llama3's GGUF

2024-08-15 01:45:29 -07:00

quants.py

Significantly speed up Q4_0, Q4_1, Q4_K

2024-08-25 16:49:33 -07:00

quick_4bits_ops.py

Significantly speed up Q4_0, Q4_1, Q4_K

2024-08-25 16:49:33 -07:00

README.md

reimplement q8/q85/q4 and review and match official gguf

2024-08-15 02:41:15 -07:00

tensor_mapping.py

integrate llama3's GGUF

2024-08-15 01:45:29 -07:00

utility.py

integrate llama3's GGUF

2024-08-15 01:45:29 -07:00

vocab.py

integrate llama3's GGUF

2024-08-15 01:45:29 -07:00

README.md

This is Forge's implementation of GGUF - the difference is that it supports pytorch quant/dequant Codes are based on LLama.cpp's GGUF - the difference is that it supports quant