Two root causes fixed:
1. soft_empty_cache() and synchronize() in model_management.py lacked a
cpu_state == CPUState.CPU guard. They fell through to torch.cuda calls
that initialize a CUDA context (150-500MB VRAM) even in CPU-only mode.
2. comfy_kitchen is imported unconditionally at startup via quant_ops.py.
The import chain triggers torch.cuda.is_available() -> cuInit, which
initializes the CUDA driver. Now gated behind args.cpu check.
Also adds missing QuantizedLayout and register_layout_op fallback stubs
that were absent from the original ImportError handler.
Amp-Thread-ID: https://ampcode.com/threads/T-019cbd03-433e-7601-93ff-3887227496b4