40 Commits

Author SHA1 Message Date
WildAi
4fe38c9bca Update pyproject.toml 1.3.1 2025-09-03 18:20:39 +03:00
WildAi
ce4a487379 fix dtype issue 2025-09-03 18:19:48 +03:00
WildAi
e42b4aa76e Update README.md 2025-09-03 12:47:13 +03:00
WildAi
0213ce56fb Update pyproject.toml 1.3.0 2025-09-03 12:33:43 +03:00
WildAi
f9db128fdc Update README.md 2025-09-03 12:33:13 +03:00
WildAi
0b9f8a06f0 SageAttention support, fixes 2025-09-03 11:44:05 +03:00
WildAi
52cee71368 SageAttention support, fixes 2025-09-03 11:42:43 +03:00
WildAi
2aa03a8254 Update pyproject.toml 1.2.0 2025-09-01 21:42:45 +03:00
WildAi
18268d37dd Merge pull request #16 from Saganaki22/main
Transformers 4.56+ Compatibility & Force Offload Fix
2025-09-01 21:41:56 +03:00
drbaph
f565f123c6 Transformers 4.56+ Compatibility & Force Offload Fix 2025-09-01 19:26:59 +01:00
WildAi
fee5f78cc9 Update pyproject.toml 2025-09-01 13:24:47 +03:00
WildAi
8e01061d88 Merge remote-tracking branch 'origin/main' 2025-09-01 13:22:51 +03:00
WildAi
b39b784812 fixes logger 2025-09-01 13:22:19 +03:00
WildAi
2816573dea Update pyproject.toml 2025-09-01 12:41:18 +03:00
WildAi
7b9c6ce515 Update requirements.txt 2025-09-01 12:40:25 +03:00
WildAi
f44b1b103d Update README.md 2025-09-01 12:39:10 +03:00
WildAi
64fdb94e16 model path update, fixes 2025-09-01 11:57:35 +03:00
WildAi
37803a884f Update README.md 2025-09-01 11:05:30 +03:00
WildAi
2f956fb87a Merge pull request #12 from Shadowfita/patch-1
Add optional Q4 (4-bit) LLM quantization for VibeVoice
2025-08-31 20:53:05 +03:00
Orion
7419fcd66f Add optional Q4 (4-bit) LLM quantization for VibeVoice
This PR introduces an **optional 4-bit (NF4) quantization path** for the **Qwen2.5 LLM component** inside VibeVoice, using Transformers + bitsandbytes. The diffusion head and processors remain BF16/FP32. This mirrors the project’s architecture and enables the **7B preview** to run on smaller GPUs while preserving output quality. 

**Changes / Additions:**

* New toggle to run the LLM in **4-bit NF4** via `BitsAndBytesConfig`; default remains full precision.
* Q4 prefers **SDPA** attention (Flash-Attn auto-downshifts) for stability.

**Improvements on my 3080 12GB:**

* **7B**
    * **Timing:** 29m 27s → **203.47s** (\~3m 23s) — **−88.5% time** (\~**8.68× faster**)
    * **VRAM:** Q4 ≈ **7.6 GB**; FP16 ≳ **12 GB** — Q4 saves **≥4.4 GB** (≥**36.7%**)
* **1.5B**
    * **Timing (Q4):** 105s → **154s** — **+49s** (\~**1.47× slower**)
    * **VRAM:** Q4 ≈ **3.2 GB**; FP16 ≈ **8.7 GB** — Q4 saves **\~5.5 GB** (\~**63.2%**)

These changes have resulted in a nearly 90% reduction in inference time and over 40% reduction in VRAM usage with the 7B model in VRAM constrained environments with no perceptible change in quality in my limited testing.

While there is an increase in inference time with the 1.5B model, some may consider the smaller VRAM footprint worth it.
2025-08-30 23:48:25 +10:00
WildAi
20baa02e9a Update pyproject.toml 2025-08-28 16:08:30 +03:00
WildAi
48541d816d small fixes 2025-08-28 15:35:04 +03:00
WildAi
4da796065a Update README.md 2025-08-28 15:09:12 +03:00
WildAi
e5102dc535 Update pyproject.toml 2025-08-28 09:24:49 +03:00
WildAi
ecc129ddbe Merge pull request #7 from Saganaki22/main
Add configurable attention modes with compatibility checks
2025-08-28 09:19:53 +03:00
drbaph
33bc1843b9 Fix memory leaks and ComfyUI model management compatibility
- Fixed IndexError in ComfyUI's model management system when unloading models
- Improved memory cleanup to prevent VRAM leaks when switching between models
- Updated cache key handling to properly track attention mode variants
- Enhanced patcher lifecycle management to work with ComfyUI's internal systems
- Added safer model cleanup that doesn't interfere with ComfyUI's model tracking
2025-08-28 02:28:56 +01:00
drbaph
c29daa9050 Add configurable attention modes with compatibility checks
- Added dropdown selection for attention implementation (eager/sdpa/flash_attention_2)
- Implemented automatic compatibility checks and progressive fallbacks  
- Added hardware-specific optimizations for RTX 5090/Blackwell GPUs
- Enhanced error handling to prevent crashes from incompatible attention modes
2025-08-28 01:51:29 +01:00
WildAi
3a2d2aa775 Update pyproject.toml 2025-08-27 18:22:35 +03:00
WildAi
9077a28350 Merge remote-tracking branch 'origin/main' 2025-08-27 18:20:41 +03:00
WildAi
27829a377b requirements 2025-08-27 18:19:54 +03:00
WildAi
757594f9da Update README.md 2025-08-27 17:13:39 +03:00
WildAi
4948f10db0 Update pyproject.toml 2025-08-27 16:30:46 +03:00
WildAi
66710bbffc init 2025-08-27 16:23:01 +03:00
WildAi
7f85938083 Update README.md 2025-08-27 16:21:58 +03:00
WildAi
1be4756948 init examples 2025-08-27 15:53:34 +03:00
WildAi
4056f54f86 init 2025-08-27 15:51:44 +03:00
WildAi
6dc500b9ad Update pyproject.toml 2025-08-27 14:54:59 +03:00
WildAi
e87eb9a6ba Create publish.yml 2025-08-27 14:49:07 +03:00
WildAi
ad2614c658 Add files via upload 2025-08-27 14:48:25 +03:00
WildAi
68a87e5e88 Initial commit 2025-08-27 14:42:07 +03:00