Two root causes fixed:
1. soft_empty_cache() and synchronize() in model_management.py lacked a
cpu_state == CPUState.CPU guard. They fell through to torch.cuda calls
that initialize a CUDA context (150-500MB VRAM) even in CPU-only mode.
2. comfy_kitchen is imported unconditionally at startup via quant_ops.py.
The import chain triggers torch.cuda.is_available() -> cuInit, which
initializes the CUDA driver. Now gated behind args.cpu check.
Also adds missing QuantizedLayout and register_layout_op fallback stubs
that were absent from the original ImportError handler.
Amp-Thread-ID: https://ampcode.com/threads/T-019cbd03-433e-7601-93ff-3887227496b4
* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint.
* Updated design using Tensor Subclasses
* Fix FP8 MM
* An actually functional POC
* Remove CK reference and ensure correct compute dtype
* Update unit tests
* ruff lint
* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint.
* Updated design using Tensor Subclasses
* Fix FP8 MM
* An actually functional POC
* Remove CK reference and ensure correct compute dtype
* Update unit tests
* ruff lint
* Fix missing keys
* Rename quant dtype parameter
* Rename quant dtype parameter
* Fix unittests for CPU build