update TODO

2026-04-28 10:21:20 +00:00 · 2024-01-12 11:56:25 +02:00
parent 05eed3f0a3
commit 2fe7dcad8e
1 changed files with 23 additions and 7 deletions
--- a/talkinghead/TODO.md
+++ b/talkinghead/TODO.md
@@ -22,7 +22,7 @@
      - The manual poser currently produces individual emotion `.json` files only.
      - When batch-exporting from the manual poser, also automatically produce a combined `_emotions.json`.
        - This also makes it easier to maintain `talkinghead/emotions/_defaults.json`, because the batch export then generates all necessary files.
-        - Optimize the JSON export to drop zeroes, since that is the default - at least in `_emotions.json_`.
+        - Optimize the JSON export to drop zeroes, since that is the default - at least in `_emotions.json`.
          The individual emotion files could retain the zeros, to help discoverability.
 - Add live-modifiable configuration for animation and postprocessor settings?
  - Add a new control panel to SillyTavern client extension settings
@@ -52,23 +52,39 @@
    then the "set_emotion" endpoint.
 - When a new talkinghead sprite is uploaded:
  - The preview thumbnail in the client doesn't update.
- Other places in *Character Expressions* where we need to check whether the `talkinghead` module is enabled?
- Check zip upload whether it refreshes the talkinghead character.
+- Are there other places in *Character Expressions* (`SillyTavern/public/scripts/extensions/expressions/index.js`)
+  where we need to check whether the `talkinghead` module is enabled? `(!isTalkingHeadEnabled() || !modules.includes('talkinghead'))`
+- Check zip upload whether it refreshes the talkinghead character (it should).

 ### Common

- Add pictures to the README.
+- Add pictures to the talkinghead README.
  - Screenshot of the manual poser. Anything else the user needs to know about it?
  - Examples of generated poses, highlighting both success and failure cases. How the live talking head looks in the actual SillyTavern GUI.
- Document postprocessor filters and their settings in the README, with example pictures.
+- Document the per-character configuration in the README:
+  - Animator settings,
+  - Emotion templates,
+  - Postprocessor filters (with example pictures).
 - Merge appropriate material from old user manual into the new README.
 - Update the user manual.
+  - This extension really has nothing to do with VTubing, except that this uses a (different!) character animation technology that produces
+    output similar to VTubing software such as Live2D.
+  - Emphasize that `talkinghead` is an AI-powered character animation technology that animates *the AI character's* avatar
+    (cf. VTubing where the idea is to animate the *user's* avatar). Current focus is on 1-on-1 interactions. Group chats
+    and visual novel mode are not supported.
 - Far future:
-  - Lip-sync talking animation to TTS output (need realtime data from client)
-    - THA3 has morphs for A, I, U, E, O, and the "mouth delta" shape Δ.
  - Fast, high-quality scaling mechanism.
    - On a 4k display, the character becomes rather small, which looks jarring on the default backgrounds.
    - The algorithm should be cartoon-aware, some modern-day equivalent of waifu2x. A GAN such as 4x-AnimeSharp or Remacri would be nice, but too slow.
    - Maybe the scaler should run at the client side to avoid the need to stream 1024x1024 PNGs.
      - What JavaScript anime scalers are there, or which algorithms are simple enough for a small custom implementation?
+  - Lip-sync talking animation to TTS output.
+    - THA3 has morphs for A, I, U, E, O, and the "mouth delta" shape Δ.
+    - This needs either:
+      - Realtime data from client
+      - Or if ST-extras generates the TTS output, then at least a start timestamp for the playback of a given TTS output audio file,
+        and a possibility to stop animating if the user stops the audio.
+  - Postprocessor for static character expression sprites.
+    - This would need reimplementing the static sprite system at the `talkinghead` end (so that we can apply per-frame dynamic postprocessing),
+      and then serving that as `result_feed`.
  - Group chats / visual novel mode / several talkingheads running simultaneously.