Implement persistent thread pool for multi-GPU CFG splitting (#13329)

Replace per-step thread create/destroy in _calc_cond_batch_multigpu with a
persistent MultiGPUThreadPool. Each worker thread calls torch.cuda.set_device()
once at startup, preserving compiled kernel caches across diffusion steps.

- Add MultiGPUThreadPool class in comfy/multigpu.py
- Create pool in CFGGuider.outer_sample(), shut down in finally block
- Main thread handles its own device batch directly for zero overhead
- Falls back to sequential execution if no pool is available
This commit is contained in:
Jedrzej Kosinski
2026-04-08 02:39:07 -10:00
committed by GitHub
parent da3864436c
commit 4b93c4360f
3 changed files with 108 additions and 13 deletions

View File

@@ -11,6 +11,7 @@ import comfy.hooks
import comfy.patcher_extension
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from comfy.model_base import BaseModel
from comfy.model_patcher import ModelPatcher
from comfy.controlnet import ControlBase