ComfyUI-VibeVoice

mirror of https://github.com/wildminder/ComfyUI-VibeVoice.git synced 2026-01-26 14:39:45 +00:00

Author	SHA1	Message	Date
WildAi	4c9785da8b	major refactoring	2025-09-10 12:06:26 +03:00
RodriMora	5f7eb9b57e	Update vibevoice_nodes.py Some people have reported better results at CFG 3.0 with lower steps. Eg: CFG 3.0 at 3 steps yields better results than 1.3cfg at 10 steps	2025-09-08 17:07:32 +02:00
WildAi	2fdcec29f6	new repo link	2025-09-04 10:07:09 +03:00
WildAi	642fd3f70a	fix tokenizer issue	2025-09-03 20:47:05 +03:00
WildAi	ce4a487379	fix dtype issue	2025-09-03 18:19:48 +03:00
WildAi	52cee71368	SageAttention support, fixes	2025-09-03 11:42:43 +03:00
drbaph	f565f123c6	Transformers 4.56+ Compatibility & Force Offload Fix	2025-09-01 19:26:59 +01:00
WildAi	64fdb94e16	model path update, fixes	2025-09-01 11:57:35 +03:00
Orion	7419fcd66f	Add optional Q4 (4-bit) LLM quantization for VibeVoice This PR introduces an optional 4-bit (NF4) quantization path for the Qwen2.5 LLM component inside VibeVoice, using Transformers + bitsandbytes. The diffusion head and processors remain BF16/FP32. This mirrors the project’s architecture and enables the 7B preview to run on smaller GPUs while preserving output quality. Changes / Additions: * New toggle to run the LLM in 4-bit NF4 via `BitsAndBytesConfig`; default remains full precision. * Q4 prefers SDPA attention (Flash-Attn auto-downshifts) for stability. Improvements on my 3080 12GB: * 7B * Timing: 29m 27s → 203.47s (\~3m 23s) — −88.5% time (\~8.68× faster) * VRAM: Q4 ≈ 7.6 GB; FP16 ≳ 12 GB — Q4 saves ≥4.4 GB (≥36.7%) * 1.5B * Timing (Q4): 105s → 154s — +49s (\~1.47× slower) * VRAM: Q4 ≈ 3.2 GB; FP16 ≈ 8.7 GB — Q4 saves \~5.5 GB (\~63.2%) These changes have resulted in a nearly 90% reduction in inference time and over 40% reduction in VRAM usage with the 7B model in VRAM constrained environments with no perceptible change in quality in my limited testing. While there is an increase in inference time with the 1.5B model, some may consider the smaller VRAM footprint worth it.	2025-08-30 23:48:25 +10:00
WildAi	48541d816d	small fixes	2025-08-28 15:35:04 +03:00
drbaph	33bc1843b9	Fix memory leaks and ComfyUI model management compatibility - Fixed IndexError in ComfyUI's model management system when unloading models - Improved memory cleanup to prevent VRAM leaks when switching between models - Updated cache key handling to properly track attention mode variants - Enhanced patcher lifecycle management to work with ComfyUI's internal systems - Added safer model cleanup that doesn't interfere with ComfyUI's model tracking	2025-08-28 02:28:56 +01:00
drbaph	c29daa9050	Add configurable attention modes with compatibility checks - Added dropdown selection for attention implementation (eager/sdpa/flash_attention_2) - Implemented automatic compatibility checks and progressive fallbacks - Added hardware-specific optimizations for RTX 5090/Blackwell GPUs - Enhanced error handling to prevent crashes from incompatible attention modes	2025-08-28 01:51:29 +01:00
WildAi	66710bbffc	init	2025-08-27 16:23:01 +03:00

13 Commits