Jedrzej Kosinski
f4b99bc623
Made multigpu deepclone load model from disk to avoid needing to deepclone actual model object, fixed issues with merge, turn off cuda backend as it causes device mismatch issue with rope (and potentially other ops), will investigate
2026-02-17 04:55:00 -08:00
comfyanonymous
6165c38cb5
Optimize nvfp4 lora applying. ( #11866 )
...
This changes results a bit but it also speeds up things a lot.
2026-01-14 00:49:38 -05:00
comfyanonymous
b3c0e4de57
Make loras work on nvfp4 models. ( #11837 )
...
The initial applying is a bit slow but will probably be sped up in the
future.
2026-01-12 22:33:54 -05:00
comfyanonymous
21e8425087
Add warning for old pytorch. ( #11718 )
2026-01-07 21:07:26 -05:00
comfyanonymous
edee33f55e
Disable comfy kitchen cuda if pytorch cuda less than 13 ( #11681 )
2026-01-06 22:13:43 -05:00
comfyanonymous
6da00dd899
Initial ops changes to use comfy_kitchen: Initial nvfp4 checkpoint support. ( #11635 )
...
---------
Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com >
2026-01-05 21:48:58 -05:00
comfyanonymous
791e30ff50
Fix nan issue when quantizing fp16 tensor. ( #11213 )
2025-12-09 17:03:21 -05:00
comfyanonymous
43071e3de3
Make old scaled fp8 format use the new mixed quant ops system. ( #11000 )
2025-12-05 14:35:42 -05:00
Urle Sistiana
6484ac89dc
fix QuantizedTensor.is_contiguous ( #10956 ) ( #10959 )
2025-11-28 16:33:07 -05:00
rattus
3f382a4f98
quant ops: Dequantize weight in-place ( #10935 )
...
In flux2 these weights are huge (200MB). As plain_tensor is a throw-away
deep copy, do this multiplication in-place to save VRAM.
2025-11-27 08:06:30 -08:00
comfyanonymous
bdb10a583f
Fix loras not working on mixed fp8. ( #10899 )
2025-11-26 00:07:58 -05:00
comfyanonymous
015a0599d0
I found a case where this is needed ( #10875 )
2025-11-25 03:23:19 -05:00
comfyanonymous
b6805429b9
Allow pinning quantized tensors. ( #10873 )
2025-11-25 02:48:20 -05:00
comfyanonymous
25022e0b09
Cleanup and fix issues with text encoder quants. ( #10872 )
2025-11-25 01:48:53 -05:00
contentis
3b3ef9a77a
Quantized Ops fixes ( #10715 )
...
* offload support, bug fixes, remove mixins
* add readme
2025-11-12 18:26:52 -05:00
comfyanonymous
af4b7b5edb
More fp8 torch.compile regressions fixed. ( #10625 )
2025-11-03 22:14:20 -05:00
comfyanonymous
6b88478f9f
Bring back fp8 torch compile performance to what it should be. ( #10622 )
2025-11-03 19:22:10 -05:00
comfyanonymous
e199c8cc67
Fixes ( #10621 )
2025-11-03 17:58:24 -05:00
comfyanonymous
958a17199a
People should update their pytorch versions. ( #10618 )
2025-11-03 17:08:30 -05:00
comfyanonymous
c58c13b2ba
Fix torch compile regression on fp8 ops. ( #10580 )
2025-11-01 00:25:17 -04:00
comfyanonymous
906c089957
Fix small performance regression with fp8 fast and scaled fp8. ( #10537 )
2025-10-29 19:29:01 -04:00
comfyanonymous
1a58087ac2
Reduce memory usage for fp8 scaled op. ( #10531 )
2025-10-29 15:43:51 -04:00
contentis
8817f8fc14
Mixed Precision Quantization System ( #10498 )
...
* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint.
* Updated design using Tensor Subclasses
* Fix FP8 MM
* An actually functional POC
* Remove CK reference and ensure correct compute dtype
* Update unit tests
* ruff lint
* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint.
* Updated design using Tensor Subclasses
* Fix FP8 MM
* An actually functional POC
* Remove CK reference and ensure correct compute dtype
* Update unit tests
* ruff lint
* Fix missing keys
* Rename quant dtype parameter
* Rename quant dtype parameter
* Fix unittests for CPU build
2025-10-28 16:20:53 -04:00