{ "id": "b91265e5-1b03-4b63-8dc3-4abd9a030e08", "revision": 0, "last_node_id": 14, "last_link_id": 44, "nodes": [ { "id": 3, "type": "SaveAudio", "pos": [ -1040, -1130 ], "size": [ 270, 112 ], "flags": {}, "order": 6, "mode": 0, "inputs": [ { "name": "audio", "type": "AUDIO", "link": 27 } ], "outputs": [], "properties": { "Node name for S&R": "SaveAudio", "cnr_id": "comfy-core", "ver": "0.3.52", "ue_properties": { "widget_ue_connectable": { "filename_prefix": true, "audioUI": true }, "version": "7.0.1" } }, "widgets_values": [ "audio/VibeVoice" ] }, { "id": 13, "type": "MarkdownNote", "pos": [ -1898.1748046875, -1409.22314453125 ], "size": [ 1035.619873046875, 211.96694946289062 ], "flags": {}, "order": 0, "mode": 0, "inputs": [], "outputs": [], "title": "Note", "properties": {}, "widgets_values": [ "# ComfyUI-VibeVoice\n\nVibeVoice is a novel framework by Microsoft for generating expressive, long-form, multi-speaker conversational audio. It excels at creating natural-sounding dialogue, podcasts, and more, with consistent voices for up to 4 speakers.\n\n**✨ Key Features:**\n* **Multi-Speaker TTS:** Generate conversations with up to 4 distinct voices in a single audio output.\n* **High-Fidelity Voice Cloning:** Use any audio file (`.wav`, `.mp3`) as a reference for a speaker's voice.\n* **Hybrid Generation Mode:** Mix and match cloned voices with high-quality, zero-shot generated voices in the same script.\n* **Flexible Scripting:** Use simple `[1]` tags or the classic `Speaker 1:` format to write your dialogue.\n* **Advanced Attention Mechanisms:** Choose between `eager`, `sdpa`, `flash_attention_2`, and the high-performance `sage` attention for fine-tuned control over speed and compatibility.\n* **Robust 4-Bit Quantization:** Run the large language model component in 4-bit mode to significantly reduce VRAM usage.\n* **Automatic Model Management:** Models are downloaded automatically and managed efficiently by ComfyUI to save VRAM." ], "color": "#233", "bgcolor": "#355" }, { "id": 4, "type": "LoadAudio", "pos": [ -1900, -1130 ], "size": [ 272.9800720214844, 136 ], "flags": {}, "order": 1, "mode": 0, "inputs": [], "outputs": [ { "name": "AUDIO", "type": "AUDIO", "links": [] } ], "properties": { "Node name for S&R": "LoadAudio", "cnr_id": "comfy-core", "ver": "0.3.52", "ue_properties": { "widget_ue_connectable": { "audio": true, "audioUI": true, "upload": true }, "version": "7.0.1" } }, "widgets_values": [ "male_rickmorty.mp3", null, null ] }, { "id": 8, "type": "LoadAudio", "pos": [ -1901.10009765625, -948.7998046875 ], "size": [ 274.080078125, 136 ], "flags": {}, "order": 2, "mode": 0, "inputs": [], "outputs": [ { "name": "AUDIO", "type": "AUDIO", "links": [] } ], "properties": { "Node name for S&R": "LoadAudio", "cnr_id": "comfy-core", "ver": "0.3.52", "ue_properties": { "widget_ue_connectable": { "audio": true, "audioUI": true, "upload": true }, "version": "7.0.1" } }, "widgets_values": [ "male_stewie.mp3", null, null ] }, { "id": 12, "type": "MarkdownNote", "pos": [ -1915.701904296875, -762.380126953125 ], "size": [ 312.85455322265625, 292.8734130859375 ], "flags": {}, "order": 3, "mode": 0, "inputs": [], "outputs": [], "title": "Note", "properties": {}, "widgets_values": [ "### Scripting and Voice Modes\n\n#### Speaker Tagging\nYou can assign lines to speakers in two ways. Both are treated identically.\n\n* **Modern Format (Recommended):** `[1] This is the first speaker.`\n* **Classic Format:** `Speaker 1: This is the first speaker.`\n\nYou can also add an optional colon to the modern format (e.g., `[1]: ...`). The node handles all variations consistently.\n\n#### Hybrid Voice Generation\nThis is a powerful feature that lets you mix cloned voices and generated (zero-shot) voices.\n\n* **To Clone a Voice:** Connect a `Load Audio` node to the speaker's input (e.g., `speaker_1_voice`).\n* **To Generate a Voice:** Leave the speaker's input empty. The model will create a unique, high-quality voice for that speaker." ], "color": "#233", "bgcolor": "#355" }, { "id": 14, "type": "MarkdownNote", "pos": [ -1048.3660888671875, -960.8771362304688 ], "size": [ 280.797607421875, 487.02728271484375 ], "flags": {}, "order": 4, "mode": 0, "inputs": [], "outputs": [], "title": "Note", "properties": {}, "widgets_values": [ "## Models\n\nWill be downloaded on the first run, or download them manually and place them into the directory: /models/tts/VibeVoice\n\n| Model | Context Length | Generation Length | Weight |\n|-------|----------------|----------|----------|\n| VibeVoice-1.5B | 64K | ~90 min | [HF link](https://huggingface.co/microsoft/VibeVoice-1.5B) |\n| VibeVoice-Large| 32K | ~45 min | [HF link](https://huggingface.co/microsoft/VibeVoice-Large) |\n\n## Support \n\n- Don't know how to update PyTorch?\n- Need help with ComfyUI?\n- Need technical support?\n\n### Or do you just have questions? Then join the [@TokenDiffusion Hub](https://t.me/TokenDiff_hub) group\n\n### AI news [TokenDiffusion](https://t.me/TokenDiff)" ], "color": "#233", "bgcolor": "#355" }, { "id": 11, "type": "VibeVoiceTTS", "pos": [ -1570, -1130 ], "size": [ 475.3999938964844, 662.9000244140625 ], "flags": {}, "order": 5, "mode": 0, "inputs": [ { "name": "speaker_1_voice", "shape": 7, "type": "AUDIO", "link": null }, { "name": "speaker_2_voice", "shape": 7, "type": "AUDIO", "link": null }, { "name": "speaker_3_voice", "shape": 7, "type": "AUDIO", "link": null }, { "name": "speaker_4_voice", "shape": 7, "type": "AUDIO", "link": null } ], "outputs": [ { "name": "AUDIO", "type": "AUDIO", "links": [ 27 ] } ], "properties": { "Node name for S&R": "VibeVoiceTTS", "cnr_id": "ComfyUI-VibeVoice", "ver": "37803a884fb8f9b43c38286f6d654c7f97181a73", "ue_properties": { "widget_ue_connectable": { "model_name": true, "text": true, "quantize_llm_4bit": true, "attention_mode": true, "cfg_scale": true, "inference_steps": true, "seed": true, "do_sample": true, "temperature": true, "top_p": true, "top_k": true }, "version": "7.0.1" } }, "widgets_values": [ "VibeVoice-1.5B", "[1] I can't believe you did it again. I waited for two hours. Two hours! Not a single call, not a text. Do you have any idea how embarrassing that was, just sitting there alone?\n[2] Look, I know, I'm sorry, alright? Work was a complete nightmare. My boss dropped a critical deadline on me at the last minute. I didn't even have a second to breathe, let alone check my phone.\n", false, "flash_attention_2", 1.3, 10, 471935335072093, "fixed", true, 0.95, 0.95, 0, false ], "color": "#232", "bgcolor": "#353" } ], "links": [ [ 27, 11, 0, 3, 0, "AUDIO" ] ], "groups": [], "config": {}, "extra": { "ds": { "scale": 0.8264462809917354, "offset": [ 2015.701904296875, 1509.22314453125 ] }, "ue_links": [], "links_added_by_ue": [], "frontendVersion": "1.26.11", "VHS_latentpreview": false, "VHS_latentpreviewrate": 0, "VHS_MetadataImage": true, "VHS_KeepIntermediate": true }, "version": 0.4 }