Files
ComfyUI/comfy
rattus e721e24136 ops: implement lora requanting for non QuantizedTensor fp8 (#12668)
Allow non QuantizedTensor layer to set want_requant to get the post lora
calculation stochastic cast down to the original input dtype.

This is then used by the legacy fp8 Linear implementation to set the
compute_dtype to the preferred lora dtype but then want_requant it back
down to fp8.

This fixes the issue with --fast fp8_matrix_mult is combined with
--fast dynamic_vram which doing a lora on an fp8_ non QT model.
2026-02-27 19:05:51 -05:00
..
2024-06-27 18:43:11 -04:00
2026-02-26 01:30:31 -05:00
2025-01-24 06:15:54 -05:00
2025-07-06 07:07:39 -04:00
2026-01-01 22:06:14 -05:00
2026-02-25 23:38:46 -05:00