Commit Graph

2 Commits

Author SHA1 Message Date
layerdiffusion
acf99dd74e fix old version of pytorch 2024-08-26 06:51:48 -07:00
layerdiffusion
82dfc2b15b Significantly speed up Q4_0, Q4_1, Q4_K
by precomputing all possible 4bit dequant into a lookup table and use pytorch indexing to get dequant, rather than really computing the bit operations.
This should give very similar performance to native CUDA kernels, while being LoRA friendly and more flexiable
2024-08-25 16:49:33 -07:00