mirror of
https://github.com/SillyTavern/SillyTavern-Extras.git
synced 2026-04-30 11:21:28 +00:00
update TODO, also reorganize it to make priorities clear
This commit is contained in:
@@ -1,24 +1,64 @@
|
||||
## Talkinghead TODO
|
||||
|
||||
### Live mode
|
||||
### High priority
|
||||
|
||||
- Add a server-side config for animator and postprocessor settings.
|
||||
- BACKEND: Add a server-side config for animator and postprocessor settings.
|
||||
- For symmetry with emotion handling; but also foreseeable that target FPS is an installation-wide thing instead of a character-wide thing.
|
||||
Currently we don't have a way to set it installation-wide.
|
||||
- Fix timing of microsway based on 25 FPS reference.
|
||||
- Fix timing of dynamic postprocessor effects, these should also use a 25 FPS reference.
|
||||
- Postprocessor for static character expression sprites.
|
||||
- This would need reimplementing the static sprite system at the `talkinghead` end (so that we can apply per-frame dynamic postprocessing),
|
||||
and then serving that as `result_feed`.
|
||||
- Easier solution is to fake it: just invoke the poser once for the target pose of each expression (lazily, as each expression is first seen),
|
||||
and cache the results. The engine should actually already do this, it seems to use some sort of cached policy. Disable just the animation parts.
|
||||
This would still use THA3, but without animation, and would likely be usable in CPU mode, possibly even with some light postprocessing.
|
||||
- Add live-modifiable configuration for animation and postprocessor settings?
|
||||
- BACKEND: Fix timing of microsway based on 25 FPS reference (currently the character may jitter too much at high FPS).
|
||||
- BACKEND: Fix timing of dynamic postprocessor effects, these should also use a 25 FPS reference.
|
||||
- FRONTEND: Check zip upload whether it refreshes the talkinghead character (it should).
|
||||
- FRONTEND: Switching `talkinghead` mode on/off in Character Expressions should set the expression to the current emotion.
|
||||
- The client *does* store the emotion, as evidenced by this quick reply STScript:
|
||||
`/lastsprite {{char}} | /echo Current sprite of {{char}}: {{pipe}}`
|
||||
So we should find what implements the slash command `/lastsprite`, to find where the emotion is stored.
|
||||
- FRONTEND: If `classify` is enabled, emotion state should be updated from the latest AI-generated text
|
||||
when switching chat files, to resume in the same emotion state where the chat left off.
|
||||
- Use the expression setting mechanism to set the emotion.
|
||||
- Investigate what calls `/api/classify` (other than the expression setting code in Character Expressions); classifying updates the talkinghead state.
|
||||
We should make the same code (at the client end) also update the sprite if Character Expressions is enabled, and call that code after switching to a different chat.
|
||||
- FRONTEND: Are there other places in *Character Expressions* (`SillyTavern/public/scripts/extensions/expressions/index.js`)
|
||||
where we need to check whether the `talkinghead` module is enabled? `(!isTalkingHeadEnabled() || !modules.includes('talkinghead'))`
|
||||
- DOCUMENTATION: Polish up the documentation for release.
|
||||
- Add pictures to the talkinghead README.
|
||||
- Screenshot of the manual poser. Anything else we should say about it?
|
||||
- Examples of generated poses, highlighting both success and failure cases. How the live talking head looks in the actual SillyTavern GUI. Link the original THA tech reports.
|
||||
- Examples of postprocessor filter results.
|
||||
- How each postprocessor example config looks when rendering the example character.
|
||||
- Merge appropriate material from old user manual into the new README.
|
||||
- Update/rewrite the user manual, based on the new README.
|
||||
- This should replace the old manual at https://docs.sillytavern.app/extras/extensions/talkinghead/
|
||||
|
||||
### Medium priority
|
||||
|
||||
- FRONTEND: When a new talkinghead sprite is uploaded:
|
||||
- The preview thumbnail in the client doesn't update.
|
||||
- FRONTEND: Not related to talkinghead, but since I have a TODO list here:
|
||||
- In *Manage chat files*, when using the search feature, clicking on a search result either does nothing,
|
||||
or opens the wrong chat. When not searching, clicking on a previous chat correctly opens that specific chat.
|
||||
|
||||
### Low priority
|
||||
|
||||
- FRONTEND/BACKEND: To save GPU resources, automatically pause animation when the web browser window with SillyTavern is not in focus. Resume when it regains focus.
|
||||
- Needs a new API endpoint for pause/resume. Note the current `/api/talkinghead/unload` is actually a pause function (the client pauses, and
|
||||
then just hides the live image), but there is currently no resume function (except `/api/talkinghead/load`, which requires sending an image file).
|
||||
- BACKEND: Low compute mode: static poses + postprocessor.
|
||||
- Poses would be generated from `talkinghead.png` using THA3, as usual, but only once per session. Each pose would be cached.
|
||||
- To prevent postproc hiccups (in dynamic effects such as CRT TV simulation) during static pose generation in CPU mode, there are at least two possible approaches.
|
||||
- Generate all poses when the plugin starts. At 2 FPS and 28 poses, this would lead to a 14-second delay. Not good.
|
||||
- Run the postprocessor in a yet different thread, and postproc the most recent poser output available.
|
||||
- This would introduce one more frame of buffering, and split the render thread into two: the poser (which is 99% of the current `Animator`),
|
||||
and the postprocessor (which is invoked by `Animator`, but implemented in a separate class).
|
||||
- This *might* make it feasible to use CPU mode for static poses with postprocessing.
|
||||
- But I'll need to benchmark the postproc code first, whether it's fast enough to run on CPU in realtime.
|
||||
- Alpha-blending between the static poses would need to be implemented in the `talkinghead` module, similarly to how the frontend switches between static expression sprites.
|
||||
- Maybe a clean way would be to provide different posing strategies (alternative poser classes): realtime posing, or static posing with alpha-blending.
|
||||
- FRONTEND: Add live-modifiable configuration for animation and postprocessor settings?
|
||||
- Add a new control panel to SillyTavern client extension settings
|
||||
- Send new configs to backend whenever anything changes
|
||||
- Small performance optimization: see if we could use more in-place updates in the postprocessor, to reduce allocation of temporary tensors.
|
||||
- BACKEND: Small performance optimization: see if we could use more in-place updates in the postprocessor, to reduce allocation of temporary tensors.
|
||||
- The effect on speed will be small; the compute-heaviest part is the inference of the THA3 deep-learning model.
|
||||
- Add more postprocessing filters. Possible ideas, no guarantee I'll ever get around to them:
|
||||
- BACKEND: Add more postprocessing filters. Possible ideas, no guarantee I'll ever get around to them:
|
||||
- Pixelize, posterize (8-bit look)
|
||||
- Analog video glitches
|
||||
- Partition image into bands, move some left/right temporarily
|
||||
@@ -30,50 +70,20 @@
|
||||
- Missing data (zero out the alpha?)
|
||||
- Blur (leads to replacing by average color, with controllable sigma)
|
||||
- Zigzag deformation
|
||||
- Investigate if some particular emotions could use a small random per-frame oscillation applied to "iris_small",
|
||||
- BACKEND: Investigate if some particular emotions could use a small random per-frame oscillation applied to "iris_small",
|
||||
for that anime "intense emotion" effect (since THA3 doesn't have a morph specifically for the specular reflections in the eyes).
|
||||
|
||||
### Client side
|
||||
### Far future
|
||||
|
||||
- Switching `talkinghead` mode on/off in Character Expressions should set the expression to the current emotion.
|
||||
- The client *does* store the emotion, as evidenced by this quick reply STScript:
|
||||
/lastsprite {{char}} | /echo Current sprite of {{char}}: {{pipe}}
|
||||
So we should find what implements the slash command `/lastsprite`, to find where the emotion is stored.
|
||||
- If `classify` is enabled, emotion state should be updated from the latest AI-generated text
|
||||
when switching chat files, to resume in the same emotion state where the chat left off.
|
||||
- Use the expression setting mechanism to set the emotion.
|
||||
- Investigate what calls `/api/classify` (other than the expression setting code in Character Expressions); classifying updates the talkinghead state.
|
||||
We should make the same code (at the client end) also update the sprite if Character Expressions is enabled, and call that code after switching to a different chat.
|
||||
- When a new talkinghead sprite is uploaded:
|
||||
- The preview thumbnail in the client doesn't update.
|
||||
- Not related to talkinghead, but client bug, came up during testing: in *Manage chat files*, when using the search feature,
|
||||
clicking on a search result either does nothing, or opens the wrong chat. When not searching, clicking on a previous chat
|
||||
correctly opens that specific chat.
|
||||
- Are there other places in *Character Expressions* (`SillyTavern/public/scripts/extensions/expressions/index.js`)
|
||||
where we need to check whether the `talkinghead` module is enabled? `(!isTalkingHeadEnabled() || !modules.includes('talkinghead'))`
|
||||
- Check zip upload whether it refreshes the talkinghead character (it should).
|
||||
|
||||
### Common
|
||||
|
||||
- Add pictures to the talkinghead README.
|
||||
- Screenshot of the manual poser. Anything else the user needs to know about it?
|
||||
- Examples of generated poses, highlighting both success and failure cases. How the live talking head looks in the actual SillyTavern GUI.
|
||||
- Examples of postprocessor filter results.
|
||||
- Merge appropriate material from old user manual into the new README.
|
||||
- Update/rewrite the user manual, based on the new README.
|
||||
- Far future:
|
||||
- To save GPU resources, automatically pause animation when the web browser window with SillyTavern is not in focus. Resume when it regains focus.
|
||||
- Needs a new API endpoint for pause/resume. Note the current `/api/talkinghead/unload` is actually a pause function (the client pauses, and
|
||||
then just hides the live image), but there is currently no resume function (except `/api/talkinghead/load`, which requires sending an image file).
|
||||
- Fast, high-quality scaling mechanism.
|
||||
- On a 4k display, the character becomes rather small, which looks jarring on the default backgrounds.
|
||||
- The algorithm should be cartoon-aware, some modern-day equivalent of waifu2x. A GAN such as 4x-AnimeSharp or Remacri would be nice, but too slow.
|
||||
- Maybe the scaler should run at the client side to avoid the need to stream 1024x1024 PNGs.
|
||||
- What JavaScript anime scalers are there, or which algorithms are simple enough for a small custom implementation?
|
||||
- Lip-sync talking animation to TTS output.
|
||||
- THA3 has morphs for A, I, U, E, O, and the "mouth delta" shape Δ.
|
||||
- This needs either:
|
||||
- Realtime data from client
|
||||
- Or if ST-extras generates the TTS output, then at least a start timestamp for the playback of a given TTS output audio file,
|
||||
and a possibility to stop animating if the user stops the audio.
|
||||
- Group chats / visual novel mode / several talkingheads running simultaneously.
|
||||
- Fast, high-quality scaling mechanism.
|
||||
- On a 4k display, the character becomes rather small, which looks jarring on the default backgrounds.
|
||||
- The algorithm should be cartoon-aware, some modern-day equivalent of waifu2x. A GAN such as 4x-AnimeSharp or Remacri would be nice, but too slow.
|
||||
- Maybe the scaler should run at the client side to avoid the need to stream 1024x1024 PNGs.
|
||||
- What JavaScript anime scalers are there, or which algorithms are simple enough for a small custom implementation?
|
||||
- Lip-sync talking animation to TTS output.
|
||||
- THA3 has morphs for A, I, U, E, O, and the "mouth delta" shape Δ.
|
||||
- This needs either:
|
||||
- Realtime data from client
|
||||
- Or if ST-extras generates the TTS output, then at least a start timestamp for the playback of a given TTS output audio file,
|
||||
and a possibility to stop animating if the user stops the audio.
|
||||
- Group chats / visual novel mode / several talkingheads running simultaneously.
|
||||
|
||||
Reference in New Issue
Block a user