Commit Graph

13 Commits

Author SHA1 Message Date
WildAi
4c9785da8b major refactoring 2025-09-10 12:06:26 +03:00
RodriMora
5f7eb9b57e Update vibevoice_nodes.py
Some people have reported better results at CFG 3.0 with lower steps.

Eg: CFG 3.0 at 3 steps yields better results than 1.3cfg at 10 steps
2025-09-08 17:07:32 +02:00
WildAi
2fdcec29f6 new repo link 2025-09-04 10:07:09 +03:00
WildAi
642fd3f70a fix tokenizer issue 2025-09-03 20:47:05 +03:00
WildAi
ce4a487379 fix dtype issue 2025-09-03 18:19:48 +03:00
WildAi
52cee71368 SageAttention support, fixes 2025-09-03 11:42:43 +03:00
drbaph
f565f123c6 Transformers 4.56+ Compatibility & Force Offload Fix 2025-09-01 19:26:59 +01:00
WildAi
64fdb94e16 model path update, fixes 2025-09-01 11:57:35 +03:00
Orion
7419fcd66f Add optional Q4 (4-bit) LLM quantization for VibeVoice
This PR introduces an **optional 4-bit (NF4) quantization path** for the **Qwen2.5 LLM component** inside VibeVoice, using Transformers + bitsandbytes. The diffusion head and processors remain BF16/FP32. This mirrors the project’s architecture and enables the **7B preview** to run on smaller GPUs while preserving output quality. 

**Changes / Additions:**

* New toggle to run the LLM in **4-bit NF4** via `BitsAndBytesConfig`; default remains full precision.
* Q4 prefers **SDPA** attention (Flash-Attn auto-downshifts) for stability.

**Improvements on my 3080 12GB:**

* **7B**
    * **Timing:** 29m 27s → **203.47s** (\~3m 23s) — **−88.5% time** (\~**8.68× faster**)
    * **VRAM:** Q4 ≈ **7.6 GB**; FP16 ≳ **12 GB** — Q4 saves **≥4.4 GB** (≥**36.7%**)
* **1.5B**
    * **Timing (Q4):** 105s → **154s** — **+49s** (\~**1.47× slower**)
    * **VRAM:** Q4 ≈ **3.2 GB**; FP16 ≈ **8.7 GB** — Q4 saves **\~5.5 GB** (\~**63.2%**)

These changes have resulted in a nearly 90% reduction in inference time and over 40% reduction in VRAM usage with the 7B model in VRAM constrained environments with no perceptible change in quality in my limited testing.

While there is an increase in inference time with the 1.5B model, some may consider the smaller VRAM footprint worth it.
2025-08-30 23:48:25 +10:00
WildAi
48541d816d small fixes 2025-08-28 15:35:04 +03:00
drbaph
33bc1843b9 Fix memory leaks and ComfyUI model management compatibility
- Fixed IndexError in ComfyUI's model management system when unloading models
- Improved memory cleanup to prevent VRAM leaks when switching between models
- Updated cache key handling to properly track attention mode variants
- Enhanced patcher lifecycle management to work with ComfyUI's internal systems
- Added safer model cleanup that doesn't interfere with ComfyUI's model tracking
2025-08-28 02:28:56 +01:00
drbaph
c29daa9050 Add configurable attention modes with compatibility checks
- Added dropdown selection for attention implementation (eager/sdpa/flash_attention_2)
- Implemented automatic compatibility checks and progressive fallbacks  
- Added hardware-specific optimizations for RTX 5090/Blackwell GPUs
- Enhanced error handling to prevent crashes from incompatible attention modes
2025-08-28 01:51:29 +01:00
WildAi
66710bbffc init 2025-08-27 16:23:01 +03:00