Squashed commit of the following:

- Add experimental visual postproc chain (lo-fi scifi hologram) - TODO: not configurable yet, it's always on until I fix this - Improve emotion preset loading logic - Even if an emotion preset JSON is missing, load the emotion from _defaults.json. - Add blunder recovery (emotion preset factory reset) options to manual poser - Fix factory-default preset name angry -> anger - Manual poser: return nonzero exit code on init error - Manual poser too now auto-installs THA3 models if needed - Move TODO list into its own file, dump everything there - Add a README for the new revised talkinghead
2026-04-28 18:31:19 +00:00 · 2023-12-25 02:08:10 +02:00
parent 08a18dc506
commit 6558edb97f
7 changed files with 388 additions and 46 deletions
--- a/server.py
+++ b/server.py
@@ -92,7 +92,8 @@ parser.add_argument(
    choices=["auto", "standard_float", "separable_float", "standard_half", "separable_half"],
 )
 parser.add_argument(
-    "--talkinghead-models", type=str, help="If THA3 models are not yet installed, use the given HuggingFace repository to install them.",
+    "--talkinghead-models", metavar="HFREPO",
+    type=str, help="If THA3 models are not yet installed, use the given HuggingFace repository to install them. Defaults to OktayAlpk/talking-head-anime-3.",
    default="OktayAlpk/talking-head-anime-3"
 )

@@ -203,7 +204,10 @@ if "talkinghead" in modules:
            )
        os.makedirs(talkinghead_models_dir, exist_ok=True)
        print(f"THA3 models not yet installed. Installing from {args.talkinghead_models} into talkinghead/tha3/models.")
-        # TODO: I'd prefer to install with symlinks, but how about Windows users?
+        # Installing with symlinks would be generally better, but MS Windows support for symlinks is not optimal,
+        # so for maximal compatibility we avoid them. The drawback of installing directly as plain files is that
+        # if multiple programs need to download THA3, they will do so separately. But THA3 is rather rare, so in
+        # practice this is unlikely to be an issue.
        snapshot_download(repo_id=args.talkinghead_models, local_dir=talkinghead_models_dir, local_dir_use_symlinks=False)

    import sys
--- a/talkinghead/README.md
+++ b/talkinghead/README.md
@@ -0,0 +1,163 @@
+## Talkinghead
+
+<!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-refresh-toc -->
+**Table of Contents**
+
+- [Talkinghead](#talkinghead)
+    - [Introduction](#introduction)
+    - [Live mode](#live-mode)
+    - [Manual poser](#manual-poser)
+    - [Creating a character](#creating-a-character)
+    - [Tips for Stable Diffusion](#tips-for-stable-diffusion)
+    - [Acknowledgements](#acknowledgements)
+
+<!-- markdown-toc end -->
+
+### Introduction
+
+This module renders a **live, AI-based custom anime avatar for your AI character**.
+
+The end result is similar to that generated by VTuber software such as *Live2D*, but this works differently. We use the THA3 AI posing engine, which takes **a single static image** of the character as input. It can vary the character's expression, and pose some joints by up to 15 degrees. Modern GPUs have enough compute to do this in realtime.
+
+This has some implications:
+
+- You can produce new characters in a fast and agile manner.
+  - One expression is enough. No need to make 28 manually.
+  - If you need to modify some details in the character's outfit, just edit the image (either manually, or by Stable Diffusion/ControlNet).
+- We can produce parametric animation on the fly, just like from a traditional 2D or 3D model - but the model is a generative AI.
+
+As with any AI technology, there are limitations. The AI-generated output image may not be perfect, and in particular the model does not support characters wearing large hats or props. For details (and example outputs), refer to the original author's [tech report](https://web.archive.org/web/20220606125507/https://pkhungurn.github.io/talking-head-anime-3/full.html).
+
+Still images do not do the system justice; the realtime animation is a large part of its appeal. Preferences vary here; but if you have the hardware, try it, you might like it. If you prefer still images, and don't create new characters often, you may get better results by inpainting expression sprites in Stable Diffusion.
+
+
+### Live mode
+
+The live mode is activated by:
+
+- Loading the `talkinghead` module in *SillyTavern-extras*, and
+- In *SillyTavern* settings, checking the checkbox *Extensions ⊳ Character Expressions ⊳ Image Type - talkinghead (extras)*.
+- Your character must have a `SillyTavern/public/characters/yourcharacternamehere/talkinghead.png` for this to work. You can upload one in the settings.
+
+CUDA (*SillyTavern-extras option* `--talkinghead-gpu`) is very highly recommended. As of late 2023, a recent GPU is also recommended. For example, on a laptop with an RTX 3070 Ti mobile GPU, and the `separable_half` THA3 model (fastest and smallest; default when running on GPU), you can expect ≈40-50 FPS render performance. VRAM usage in this case is about 520 MB. CPU mode exists, but is very slow, about ≈2 FPS on an i7-12700H.
+
+We rate-limit the output to 25 FPS (maximum) to avoid DoSing the SillyTavern GUI, and attempt to reach a constant 25 FPS. If the renderer runs faster, the average GPU usage will be lower, because the animation engine only generates as many frames as are actually consumed. If the renderer runs slower, the latest available frame will be re-sent as many times as needed, to isolate the client side from any render hiccups.
+
+To customize which THA3 model to use, and where to install the THA3 models from, see the `--talkinghead-model=...` and `--talkinghead-models=...` options, respectively.
+
+If the directory `talkinghead/tha3/models/` (under the top level of *SillyTavern-extras*) does not exist, the model files are automatically downloaded from HuggingFace and installed there.
+
+
+### Manual poser
+
+This is a standalone wxPython app that you can run locally on the machine where you installed *SillyTavern-extras*. It is based on the original manual poser app in the THA3 tech demo, but this version has some important new convenience features and usability improvements.
+
+It uses the same models as the live mode. If the directory `talkinghead/tha3/models/` (under the top level of *SillyTavern-extras*) does not exist, the model files are automatically downloaded from HuggingFace and installed there.
+
+With this app, you can:
+
+- **Graphically edit the emotion templates** used by the live mode.
+  - They are JSON files, found in `talkinghead/emotions/` under your *SillyTavern-extras* folder.
+    - The GUI also has a dropdown to quickload any preset.
+  - **NEVER** delete or modify `_defaults.json`. That file stores the factory settings, and the app will not run without it.
+  - For blunder recovery: to reset an emotion back to its factory setting, see the `--factory-reset=EMOTION` option, which will use the factory settings to overwrite the corresponding emotion preset JSON. To reset **all** emotion presets to factory settings, see `--factory-reset-all`. Careful, these operations **cannot** be undone!
+    - Currently, these options do **NOT** regenerate the example images also provided in `talkinghead/emotions/`.
+- **Batch-generate the 28 static expression sprites** for a character.
+  - Input is the same single static image format as used by the live mode.
+  - You can then use the generated images as the static expression sprites for your AI character. No need to run the live mode.
+
+To run the manual poser:
+
+- Open a terminal in your `talkinghead` subdirectory
+- `conda activate extras`
+- `python -m tha3.app.manual_poser`.
+  - For systems with `bash`, a convenience wrapper `./start_manual_poser.sh` is included.
+Run the poser with the `--help` option for a description of its command-line options. The command-line options of the manual poser are **completely independent** from the options of *SillyTavern-extras* itself.
+
+Currently, you can choose the device to run on (GPU or CPU), and which THA3 model to use. By default, the manual poser uses GPU and the `separable_float` model.
+
+GPU mode gives the best response, but CPU mode (~2 FPS) is useful at least for batch-exporting static sprites when your VRAM is already full of AI.
+
+To load a PNG image or emotion JSON, you can either use the buttons, their hotkeys, or **drag'n'drop a PNG or JSON** file from your favorite file manager into the source image pane.
+
+
+### Creating a character
+
+To create an AI avatar that `talkinghead` understands:
+
+- The image must be of size 512x512, in PNG format.
+- **The image must have an alpha channel**.
+  - Any pixel with nonzero alpha is part of the character.
+  - If the edges of the silhouette look like a cheap photoshop job, check them manually for background bleed.
+- Using any method you prefer, create a front view of your character within [these specifications](Character_Card_Guide.png).
+  - In practice, you can create an image of the character in the correct pose first, and align it as a separate step.
+  - If you use Stable Diffusion, see separate section below.
+- To add an alpha channel to an image that has the character otherwise fine, but on a background:
+  - In Stable Diffusion, you can try the [rembg](https://github.com/AUTOMATIC1111/stable-diffusion-webui-rembg) extension for Automatic1111 to get a rough first approximation.
+  - Also, you can try the *Fuzzy Select* (magic wand) tool in traditional image editors such as GIMP or Photoshop.
+  - Manual pixel-per-pixel editing of edges is recommended for best results. Takes about 20 minutes per character.
+    - If you rendered the character on a light background, use a dark background layer when editing the edges, and vice versa.
+    - This makes it much easier to see which pixels have background bleed and need to be erased.
+- Finally, align the character on the canvas.
+  - We recommend using [the THA3 example character](tha3/images/example.png) as an alignment template.
+  - **IMPORTANT**: Export the final edited image, *without any background layer*, as a PNG with an alpha channel.
+- Load up the result into *SillyTavern* as a `talkinghead.png`, and see how well it performs.
+
+### Tips for Stable Diffusion
+
+It is possible to create a suitable character render with Stable Diffusion. We assume that you already have a local installation of the [Automatic1111](https://github.com/AUTOMATIC1111/stable-diffusion-webui-rembg) webui.
+
+- Don't initially worry about the alpha channel. You can add that after you have generated the image.
+- Try the various VTuber checkpoints floating around the Internet.
+  - These are trained on talking anime heads in particular, so it's much easier getting a pose that works as input for THA3.
+  - Many human-focused SD checkpoints render best quality at 512x768 (portrait). You can always crop the image later.
+- I've had good results with `meina-pro-mistoon-hll3`.
+  - It can produce good quality anime art (that looks like it came from an actual anime), and it knows how to pose a talking head.
+  - It's capable of NSFW so be careful. Use the negative prompt appropriately.
+  - As the VAE, the standard `vae-ft-mse-840000-ema-pruned.ckpt` is fine.
+  - Settings: *512x768, 20 steps, DPM++ 2M Karras, CFG scale 7*.
+  - Optionally, you can use the [Dynamic Thresholding (CFG Scale Fix)](https://github.com/mcmonkeyprojects/sd-dynamic-thresholding) extension for Automatic1111 to render the image at CFG 15 (to increase the chances of SD following the prompt correctly), but make the result look like as if it was rendered at CFG 7.
+    - Recommended settings: *Half Cosine Up, minimum CFG scale 3, mimic CFG scale 7*, all else at default values.
+- Expect to render **upwards of a hundred** *txt2img* gens to get **one** result good enough for further refinement. (At least you can produce and triage them quickly.)
+- **Make it easy for yourself to find and fix the edges.**
+  - If your character's outline consists mainly of dark colors, ask for a light background, and vice versa.
+- As always with SD, some unexpected words may generate undesirable elements that are impossible to get rid of.
+  - For example, I wanted an AI character wearing a *"futuristic track suit"*, but SD interpreted the *"futuristic"* to mean that the character should be posed on a background containing unrelated scifi tech greebles, or worse, that the result should look something like the female lead of [*Saikano* (2002)](https://en.wikipedia.org/wiki/Saikano). Removing that word solved it, but did change the outfit style, too.
+
+**Prompt**:
+
+```
+(front view, symmetry:1.2), ...character description here..., standing, arms at sides, open mouth, smiling,
+simple white background, single-color white background, (illustration, 2d, cg, masterpiece:1.2)
+```
+
+The `front view` and `symmetry`, appropriately weighted and placed at the beginning, greatly increase the chances of actually getting a direct front view.
+
+**Negative prompt**:
+
+```
+(three quarters view, detailed background:1.2), full body shot, (blurry, sketch, 3d, photo:1.2),
+...character-specific negatives here..., negative_hand-neg, verybadimagenegative_v1.3
+```
+
+As usual, the negative embeddings can be found on [Civitai](https://civitai.com/) ([negative_hand-neg](https://civitai.com/models/56519), [verybadimagenegative_v1.3](https://civitai.com/models/11772))
+
+Then just test it, and equip the negative prompt with NSFW terms if needed.
+
+The camera angle terms in the prompt may need some experimentation. Above, we put `full body shot` in the negative prompt, because in SD 1.5, at least with many anime models, full body shots often get a garbled face. However, a full body shot can actually be useful here, because it has the legs available so you can crop them at whatever point they need to be cropped to align the character's face with the template.
+
+One possible solution is to ask for a `full body shot`, and *txt2img* for a good pose and composition only, no matter the face. Then *img2img* the result, using the [ADetailer](https://github.com/Bing-su/adetailer) extension for Automatic1111 (0.75 denoise, with [ControlNet inpaint](https://stable-diffusion-art.com/controlnet/#ControlNet_Inpainting) enabled) to fix the face.
+
+**ADetailer notes**
+
+- Some versions of ADetailer may fail to render anything into the final output image if the main denoise is set to 0, no matter the ADetailer denoise setting.
+  - To work around this, use a small value for the main denoise (0.05) to force it to render, without changing the rest of the image too much.
+- When inpainting, **the inpaint mask must cover the whole area that contains the features to be detected**. Otherwise ADetailer will start to process correctly, but since the inpaint mask doesn't cover the area to be edited, it can't write there in the final output image.
+  - This makes sense in hindsight: when inpainting, the area to be edited must be masked. It doesn't matter how the inpainted image data is produced.
+
+
+### Acknowledgements
+
+This software incorporates the [THA3](https://github.com/pkhungurn/talking-head-anime-3-demo) AI-based anime posing engine developed by Pramook Khungurn. The THA3 code is used under the MIT license, and the THA3 AI models are used under the Creative Commons Attribution 4.0 International license. The THA3 example character is used under the Creative Commons Attribution-NonCommercial 4.0 International license.
+
+The manual poser code has been mostly rewritten, and the live mode code is original to this software.
--- a/talkinghead/TODO.md
+++ b/talkinghead/TODO.md
@@ -0,0 +1,61 @@
+## Talkinghead TODO
+
+### Live mode
+
+- Improve frame timing
+  - Try to keep the output FPS constant
+    - Use a queue instead of a polling loop if Python can efficiently I/O wait on them?
+    - Render one frame at startup
+    - Make the network streamer send the available frame and then request a new frame immediately
+    - Then calculate how much time is left until the next send deadline, and sleep for that, then repeat
+  - Decouple animation speed from render FPS; need to calibrate against wall time.
+    - OTOH, do we need to do this? Only needed for slow renderers, because if render FPS > network FPS,
+      the rate limiter already makes the animation run at a constant FPS.
+- Make cool-looking optional output filters:
+  - Static scanlines
+  - Dynamic scanlines (odd/even lines every other frame)
+    - Like Phosphor deinterlacer in VLC.
+    - To look good, requires a steady output FPS, and either sync to display refresh, or high enough
+      display refresh that syncing doesn't matter (could work for 24 FPS stream on a 144Hz panel).
+  - Luma noise
+  - Needs torch kernels to do these on the GPU?
+- Make the various hyperparameters user-configurable (ideally per character, but let's make a global version first):
+  - Blink timing: `blink_interval` min/max
+  - Blink probability per frame
+  - "confusion" emotion initial segment duration (where blinking quickly in succession is allowed)
+  - Sway timing: `sway_interval` min/max
+  - Sway strength (`max_random`, `max_noise`)
+  - Breathing cycle duration
+  - Output target FPS
+  - Separate animation target FPS?
+- PNG sending efficiency? Look into encoding the stream into YUVA420 using `ffmpeg`.
+- Investigate if some particular emotions could use a small random per-frame oscillation applied to "iris_small",
+  for that anime "intense emotion" effect (since THA3 doesn't have a morph specifically for the specular reflections in the eyes).
+- The "eye_unimpressed" morph has just one key in the emotion JSON, although the model has two morphs (left and right) for this.
+  - We should fix this, but it will break backward compatibility for old emotion JSON files.
+  - OTOH, maybe not much of an issue, because in all versions prior to this one being developed, the emotion JSON system
+    was underutilized anyway (only a bunch of pre-made presets, only used by the live plugin).
+  - All the more important to fix this now, before the next release, because the improved manual poser makes it easy to
+    generate new emotion JSON files, so from the next release on we can assume those to exist in the wild.
+
+### Client-side bugs / missing features:
+
+- Talking animation is broken, seems the client isn't sending us a request to start/stop talking.
+- Add `/emote xxx` support for talkinghead, would make testing much easier.
+  - Needs a new API endpoint ("emote"?) in `server.py`, and making the client call that when `/emote xxx` is used.
+- If `classify` is enabled, emotion state could be updated from the latest AI-generated text
+  when switching chat files, to resume in the same state where the chat left off.
+  - Either call the "classify" endpoint (which will re-analyze), or if the client stores the emotion,
+    then the new "emote" endpoint.
+- When a new talkinghead sprite is uploaded:
+  - The preview thumbnail doesn't update.
+  - Talkinghead must be switched off and back on to actually send the new image to the backend.
+
+### Common
+
+- Add pictures to the README.
+  - Screenshot of the manual poser. Anything else that the user needs to know about it?
+  - Examples of generated poses, highlighting both success and failure cases. How it looks in the actual GUI.
+- Merge appropriate material from old user manual into the new README.
+- Update the user manual.
+- Far future: lip-sync talking animation to TTS output (need realtime data from client)
--- a/talkinghead/emotions/_defaults.json
+++ b/talkinghead/emotions/_defaults.json
@@ -91,7 +91,7 @@
        "body_z_index": 0.0,
        "breathing_index": 0.0
    },
-    "angry": {
+    "anger": {
        "eyebrow_troubled_left_index": 0.0,
        "eyebrow_troubled_right_index": 0.0,
        "eyebrow_angry_left_index": 1.0,
@@ -1261,4 +1261,4 @@
        "body_z_index": 0.0,
        "breathing_index": 0.0
    }
-}
+}
--- a/talkinghead/tha3/app/app.py
+++ b/talkinghead/tha3/app/app.py
@@ -6,25 +6,6 @@ This module implements the live animation backend and serves the API. For usage,
 If you want to play around with THA3 expressions in a standalone app, see `manual_poser.py`.
 """

-# TODO: talkinghead live mode:
-#  - Make the various hyperparameters user-configurable (ideally per character, but let's make a global version first):
-#    - Blink timing: `blink_interval` min/max
-#    - Blink probability per frame
-#    - "confusion" emotion initial segment duration (where blinking quickly in succession is allowed)
-#    - Sway timing: `sway_interval` min/max
-#    - Sway strength (`max_random`, `max_noise`)
-#    - Breathing cycle duration
-#  - Client-side bugs / missing features:
-#    - Talking animation is broken, seems the client isn't sending us a request to start/stop talking.
-#    - If `classify` is enabled, emotion state could be updated from the latest AI-generated text
-#      when switching chat files, to resume in the same state where the chat left off.
-#    - When a new talkinghead sprite is uploaded:
-#      - The preview thumbnail doesn't update
-#      - Talkinghead must be switched off and back on to actually send the new image to the backend
-#  - PNG sending efficiency? Look into encoding the stream into YUVA420 using `ffmpeg`.
-# TODO: talkinghead common:
-#   - Write new README: use case and supported features are different from the original THA3 package.
-
 import atexit
 import io
 import logging
@@ -40,6 +21,7 @@ from typing import Dict, List, NoReturn, Optional, Union
 import PIL

 import torch
+import torchvision

 from flask import Flask, Response
 from flask_cors import CORS
@@ -291,6 +273,8 @@ class TalkingheadAnimator:

        self.breathing_epoch = time.time_ns()

+        self.frame_evenodd = 0
+
    def load_image(self, file_path=None) -> None:
        """Load the image file at `file_path`, and replace the current character with it.

@@ -577,11 +561,82 @@ class TalkingheadAnimator:
        pose = torch.tensor(self.current_pose, device=self.device, dtype=self.poser.get_dtype())

        with torch.no_grad():
-            output_image = self.poser.pose(self.source_image, pose)[0].float()  # [0]: model's output index for the full result image
-            output_image = convert_linear_to_srgb((output_image + 1.0) / 2.0)
+            # - [0]: model's output index for the full result image
+            # - model's data range is [-1, +1], linear intensity ("gamma encoded")
+            output_image = self.poser.pose(self.source_image, pose)[0].float()
+            # output_image = (output_image + 1.0) / 2.0  # -> [0, 1]
+            output_image.add_(1.0)
+            output_image.mul_(0.5)

            c, h, w = output_image.shape
-            output_image = (255.0 * torch.transpose(output_image.reshape(c, h * w), 0, 1)).reshape(h, w, c).byte()
+
+            # --------------------------------------------------------------------------------
+            # Postproc filters (TODO: refactor, and make configurable)
+
+            def apply_bloom(image: torch.tensor, luma_threshold: float = 0.8, hdr_exposure: float = 0.7) -> None:
+                """Bloom effect (lighting bleed, fake HDR). Popular in early 2000s anime."""
+                # There are tutorials on the net, see e.g.:
+                #   https://learnopengl.com/Advanced-Lighting/Bloom
+
+                # Find the bright parts.
+                Y = 0.2126 * image[0, :, :] + 0.7152 * image[1, :, :] + 0.0722 * image[2, :, :]  # HDTV luminance
+                mask = torch.ge(Y, luma_threshold)  # [h, w]
+
+                # Make a copy of the image with just the bright parts.
+                mask = torch.unsqueeze(mask, 0)  # -> [1, h, w]
+                brights = image * mask  # [c, h, w]
+
+                # Blur the bright parts. Two-pass blur to save compute.
+                # It seems that in Torch, one large 1D blur is faster than looping with a smaller one.
+                brights = torchvision.transforms.GaussianBlur((21, 1), sigma=7.0)(brights)
+                brights = torchvision.transforms.GaussianBlur((1, 21), sigma=7.0)(brights)
+
+                # Additively blend the images (note we are working in linear intensity space).
+                image.add_(brights)
+
+                # We now have a fake HDR image. Tonemap it back to LDR.
+                image[:3, :, :] = 1.0 - torch.exp(-image[:3, :, :] * hdr_exposure)  # RGB: tonemap
+                image[3, :, :] = torch.maximum(image[3, :, :], brights[3, :, :])  # alpha: max-combine
+                torch.clamp_(image, min=0.0, max=1.0)
+
+            def apply_scanlines(image: torch.tensor, field: int = 0, dynamic: bool = True) -> None:
+                """CRT TV like scanlines.
+
+                `field`: Which CRT field is dimmed at the first frame. 0 = top, 1 = bottom.
+                `dynamic`: If `True`, the dimmed field will alternate each frame (top, bottom, top, bottom, ...)
+                           for a more authentic CRT look (like Phosphor deinterlacer in VLC).
+                """
+                if dynamic:
+                    start = (field + self.frame_evenodd) % 2
+                else:
+                    start = field
+                image[3, start::2, :].mul_(0.5)  # TODO: should ideally modify just Y channel in YUV space
+                self.frame_evenodd = (self.frame_evenodd + 1) % 2
+
+            def apply_alphanoise(image: torch.tensor, magnitude: float = 0.1) -> None:
+                """Dynamic noise to alpha channel."""
+                base_magnitude = 1.0 - magnitude
+                image[3, :, :].mul_(base_magnitude + magnitude * torch.rand(h, w, device=self.device))
+
+            def apply_translucency(image: torch.tensor, alpha: float = 0.9) -> None:
+                """Translucency for hologram look."""
+                image[3, :, :].mul_(alpha)
+
+            # apply postprocess chain
+            apply_bloom(output_image)
+            apply_scanlines(output_image)
+            apply_alphanoise(output_image)
+            apply_translucency(output_image)
+
+            # end postproc filters
+            # --------------------------------------------------------------------------------
+
+            output_image = convert_linear_to_srgb(output_image)  # apply gamma correction
+
+            # convert [c, h, w] float -> [h, w, c] uint8
+            output_image = torch.transpose(output_image.reshape(c, h * w), 0, 1).reshape(h, w, c)
+            output_image = (255.0 * output_image).byte()
+
            output_image_numpy = output_image.detach().cpu().numpy()

        # Update FPS counter, measuring animation frame render time only.
--- a/talkinghead/tha3/app/manual_poser.py
+++ b/talkinghead/tha3/app/manual_poser.py
@@ -973,26 +973,84 @@ class MainFrame(wx.Frame):

 if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="THA 3 Manual Poser. Pose a character image manually. Useful for generating static expression images.")
-    parser.add_argument("--model",
-                        type=str,
-                        required=False,
-                        default="separable_float",
-                        choices=["standard_float", "separable_float", "standard_half", "separable_half"],
-                        help="The model to use. 'float' means fp32, 'half' means fp16.")
    parser.add_argument("--device",
                        type=str,
                        required=False,
                        default="cuda",
                        choices=["cpu", "cuda"],
                        help='The device to use for PyTorch ("cuda" for GPU, "cpu" for CPU).')
+    parser.add_argument("--model",
+                        type=str,
+                        required=False,
+                        default="separable_float",
+                        choices=["standard_float", "separable_float", "standard_half", "separable_half"],
+                        help="The model to use. 'float' means fp32, 'half' means fp16.")
+    parser.add_argument("--models",
+                        metavar="HFREPO",
+                        type=str,
+                        help="If THA3 models are not yet installed, use the given HuggingFace repository to install them. Defaults to OktayAlpk/talking-head-anime-3.",
+                        default="OktayAlpk/talking-head-anime-3")
+    parser.add_argument("--factory-reset",
+                        metavar="EMOTION",
+                        type=str,
+                        help="Overwrite the emotion preset EMOTION with its factory default, and exit. This CANNOT be undone!",
+                        default="")
+    parser.add_argument("--factory-reset-all",
+                        action="store_true",
+                        help="Overwrite ALL emotion presets with their factory defaults, and exit. This CANNOT be undone!")
    args = parser.parse_args()

+    # Blunder recovery options
+    if args.factory_reset_all:
+        print("Factory-resetting all emotion templates...")
+        with open(os.path.join("emotions", "_defaults.json"), "r") as json_file:
+            factory_default_emotions = json.load(json_file)
+        factory_default_emotions.pop("zero")  # not an actual emotion
+        for key in factory_default_emotions:
+            with open(os.path.join("emotions", f"{key}.json"), "w") as file:
+                json.dump({key: factory_default_emotions[key]}, file, indent=4)
+        print("Done.")
+        sys.exit(0)
+    if args.factory_reset:
+        key = args.factory_reset
+        print(f"Factory-resetting emotion template '{key}'...")
+        with open(os.path.join("emotions", "_defaults.json"), "r") as json_file:
+            factory_default_emotions = json.load(json_file)
+        factory_default_emotions.pop("zero")  # not an actual emotion
+        if key not in factory_default_emotions:
+            print(f"No such factory-defined emotion: '{key}'. Valid values: {sorted(list(factory_default_emotions.keys()))}")
+            sys.exit(1)
+        with open(os.path.join("emotions", f"{key}.json"), "w") as file:
+            json.dump({key: factory_default_emotions[key]}, file, indent=4)
+        print("Done.")
+        sys.exit(0)
+
+    # Install the THA3 models if needed
+    modelsdir = os.path.join(os.getcwd(), "tha3", "models")
+    if not os.path.exists(modelsdir):
+        # API:
+        #   https://huggingface.co/docs/huggingface_hub/en/guides/download
+        try:
+            from huggingface_hub import snapshot_download
+        except ImportError:
+            raise ImportError(
+                "You need to install huggingface_hub to install talkinghead models automatically. "
+                "See https://pypi.org/project/huggingface-hub/ for installation."
+            )
+        os.makedirs(modelsdir, exist_ok=True)
+        print(f"THA3 models not yet installed. Installing from {args.talkinghead_models} into tha3/models.")
+        # Installing with symlinks would be generally better, but MS Windows support for symlinks is not optimal,
+        # so for maximal compatibility we avoid them. The drawback of installing directly as plain files is that
+        # if multiple programs need to download THA3, they will do so separately. But THA3 is rather rare, so in
+        # practice this is unlikely to be an issue.
+        snapshot_download(repo_id=args.talkinghead_models, local_dir=modelsdir, local_dir_use_symlinks=False)
+
    try:
        device = torch.device(args.device)
-        poser = load_poser(args.model, device, modelsdir="tha3/models")
+        poser = load_poser(args.model, device, modelsdir=modelsdir)
    except RuntimeError as e:
        logger.error(e)
-        sys.exit()
+        sys.exit(255)

    # Create the "talkinghead/output" directory if it doesn't exist. This is our default save location.
    p = pathlib.Path("output").expanduser().resolve()
--- a/talkinghead/tha3/app/util.py
+++ b/talkinghead/tha3/app/util.py
@@ -80,30 +80,31 @@ def load_emotion_presets(directory: str) -> Tuple[Dict[str, Dict[str, float]], L
    a neutral pose. (This is separate from the "neutral" emotion, which is allowed
    to be "non-zero".)
    """
-    emotion_names = []
+    emotion_names = set()
    for root, dirs, files in os.walk(directory, topdown=True):
        for filename in files:
            if filename == "_defaults.json":  # skip the repository containing the default fallbacks
                continue
            if filename.lower().endswith(".json"):
-                emotion_names.append(filename[:-5])  # drop the ".json"
+                emotion_names.add(filename[:-5])  # drop the ".json"
+
+    # Load the factory-default emotions as a fallback
+    with open(os.path.join(directory, "_defaults.json"), "r") as json_file:
+        factory_default_emotions = json.load(json_file)
+    for key in factory_default_emotions:  # get keys from here too, in case some emotion files are missing
+        if key != "zero":  # not an actual emotion, but a "reset character" feature
+            emotion_names.add(key)
+
+    emotion_names = list(emotion_names)
    emotion_names.sort()  # the 28 actual emotions

-    # TODO: Note that currently, we build the list of emotion names from JSON filenames,
-    #       and then check whether each JSON implements the emotion matching its filename.
-    #       On second thought, I'm not sure whether that makes much sense. Maybe rethink the design.
-    #         - We *do* want custom JSON files to show up in the list, if those are placed in "tha3/emotions". So the list of emotions shouldn't be hardcoded.
-    #         - *Having* a fallback repository with factory defaults (and a hidden "zero" preset) is useful.
-    #           But we are currently missing a way to reset an emotion to its factory default.
    def load_emotion_with_fallback(emotion_name: str) -> Dict[str, float]:
        try:
            with open(os.path.join(directory, f"{emotion_name}.json"), "r") as json_file:
                emotions_from_json = json.load(json_file)  # A single json file may contain presets for multiple emotions.
            posedict = emotions_from_json[emotion_name]
-        except (FileNotFoundError, KeyError):  # If no separate json exists for the specified emotion, load the default (all 28 emotions have a default).
-            with open(os.path.join(directory, "_defaults.json"), "r") as json_file:
-                emotions_from_json = json.load(json_file)
-            posedict = emotions_from_json[emotion_name]
+        except (FileNotFoundError, KeyError):  # If no separate json exists for the specified emotion, load the factory default (all 28 emotions have a default).
+            posedict = factory_default_emotions[emotion_name]
        # If still not found, it's an error, so fail-fast: let the app exit with an informative exception message.
        return posedict