Implement persistent thread pool for multi-GPU CFG splitting (#13329)

Replace per-step thread create/destroy in _calc_cond_batch_multigpu with a persistent MultiGPUThreadPool. Each worker thread calls torch.cuda.set_device() once at startup, preserving compiled kernel caches across diffusion steps. - Add MultiGPUThreadPool class in comfy/multigpu.py - Create pool in CFGGuider.outer_sample(), shut down in finally block - Main thread handles its own device batch directly for zero overhead - Falls back to sequential execution if no pool is available
2026-05-11 00:20:14 +00:00 · 2026-04-08 02:39:07 -10:00
parent da3864436c
commit 4b93c4360f
3 changed files with 108 additions and 13 deletions
--- a/comfy/sampler_helpers.py
+++ b/comfy/sampler_helpers.py
@@ -11,6 +11,7 @@ import comfy.hooks
 import comfy.patcher_extension
 from typing import TYPE_CHECKING
 if TYPE_CHECKING:
+    from comfy.model_base import BaseModel
    from comfy.model_patcher import ModelPatcher
    from comfy.controlnet import ControlBase