ComfyUI/comfy/ops.py at 17106cb124fcfa0b75ea24993c65aa024059fc8d

mirror of https://github.com/comfyanonymous/ComfyUI.git synced 2026-03-02 11:50:11 +00:00

Files

rattus e721e24136 ops: implement lora requanting for non QuantizedTensor fp8 (#12668 )

Allow non QuantizedTensor layer to set want_requant to get the post lora
calculation stochastic cast down to the original input dtype.

This is then used by the legacy fp8 Linear implementation to set the
compute_dtype to the preferred lora dtype but then want_requant it back
down to fp8.

This fixes the issue with --fast fp8_matrix_mult is combined with
--fast dynamic_vram which doing a lora on an fp8_ non QT model.

2026-02-27 19:05:51 -05:00

41 KiB

Raw Blame History

View Raw

41 KiB Raw Blame History

41 KiB

Raw Blame History