update TODO

2026-01-26 17:20:04 +00:00 · 2024-01-18 14:11:36 +02:00
parent 87aaedc13b
commit a304c6c64a
1 changed files with 94 additions and 6 deletions
--- a/talkinghead/TODO.md
+++ b/talkinghead/TODO.md
@@ -2,11 +2,13 @@

 ### High priority

+- BACKEND: Postprocessor: support several effects of the same kind in the chain.
+  - Mostly this already works, but for those dynamic effects that use a cache, only one cache currently exists for each kind of effect, so they will step on each others' toes.
+  - An ID parameter, and making the cache an ID-keyed dictionary, solves this.
+- BACKEND: Add configurable crop filter to trim unused space around the sides of the character, to allow better positioning of the character in **MovingUI** mode.
 - BACKEND: Add a server-side config for animator and postprocessor settings.
  - For symmetry with emotion handling; but also foreseeable that target FPS is an installation-wide thing instead of a character-wide thing.
    Currently we don't have a way to set it installation-wide.
- BACKEND: Fix timing of dynamic postprocessor effects, these should also use a 25 FPS reference.
- BACKEND: Add configurable crop filter to trim unused space around the sides of the character, to allow better positioning of the character in **MovingUI** mode.
 - FRONTEND: Check zip upload whether it refreshes the talkinghead character (it should).
 - FRONTEND: Switching `talkinghead` mode on/off in Character Expressions should set the expression to the current emotion.
  - The client *does* store the emotion, as evidenced by this quick reply STScript:
@@ -35,10 +37,92 @@
 ### Medium priority

 - FRONTEND: When a new talkinghead sprite is uploaded:
-  - The preview thumbnail in the client doesn't update.
- FRONTEND: Not related to talkinghead, but since I have a TODO list here:
+  - The preview thumbnail in the client doesn't update. (The same goes for the other sprites, so this is a general bug in *Character Expressions*.)
+- FRONTEND: Not related to talkinghead, but since I have a TODO list here, I'm dumping notes on some potentially easily fixable things here instead of opening a ticket for each one:
  - In *Manage chat files*, when using the search feature, clicking on a search result either does nothing,
-    or opens the wrong chat. When not searching, clicking on a previous chat correctly opens that specific chat.
+    or opens the wrong chat (often the latest one, whether or not it matched the search terms). When not searching,
+    clicking on a previous chat correctly opens that specific chat.
+  - *Render Formulas* shows both the rendered formula and its plaintext. Would look better to show only the rendered formula, unless the user wants to edit it
+    (like the inline LaTeX equation renderer in Emacs).
+  - Missing tooltips:
+    - **MovingUI** (*User Settings ⊳ Advanced*): "Allow repositioning certain UI elements by dragging them."
+      - **MUI Preset** = ??? Is this a theme selector for MovingUI, affecting how the dragging GUI looks, or something else?
+    - **No WI/AN** (Extensions ⊳ Vector Storage ⊳ Chat vectorization settings): "Do not vectorize World Info and Author's Note."
+    - **Depth** (appears in many places): "How many messages before the current end of the chat."
+      - I think this is important to clarify, because at least to a programmer, "depth" first brings to mind nested brackets; and brackets are actually used in ST,
+        to make parenthetical remarks to the AI (such as for summarization: "[Pause your roleplay. Summarize...]").
+    - **AI Response Configuration**:
+      - **Top P**: Otherwise fine, but maybe mention that Top P is also known as nucleus sampling.
+      - **Top A**: Relative of Min P, but operates on squared probabilities.
+        - See https://www.reddit.com/r/KoboldAI/comments/vcgsu1/comment/icrp0n1
+      - **Tail Free Sampling**: "Estimates where the 'knee' of the next-token probability distribution is, and cuts the tail off at that point."
+        - I would assume the slider controls the `z` value, but this should be confirmed from the source code.
+        - See https://www.trentonbricken.com/Tail-Free-Sampling/
+      - **Typical P** = ???
+      - **Epsilon Cutoff** = ???
+      - **Eta Cutoff** = ???
+      - **Mirostat**: "Thermostat for output perplexity. Controls the output perplexity directly, to match the perplexity of the input. This avoids the
+        repetition trap (where, as the autoregressive inference produces text, the perplexity of the output tends toward zero) and the confusion
+        trap (where the perplexity diverges)."
+        - See https://arxiv.org/abs/2007.14966
+        - In practice, Min P can lead to similarly good results, while being simpler and faster. Should we mention this?
+      - **Beam Search** = ???
+        - At least it's the name of a classical optimization method in numerics. Also, in LLM sampling, beam search is infamous for its bad performance;
+          easily gets stuck in a repetition loop (which hints that it always picks tokens that are too probable, decreasing output perplexity).
+          I think this was mentioned in one of the Contrastive Search papers.
+      - **Contrast Search**: "The representation space of most LLMs is isotropic, and this sampler exploits that in order to encourage diversity while maintaining coherence."
+        - Name should be "Contrastive Search"
+        - In math terms, this is a minor modification to an older, standard sampling strategy. Have to re-read the paper to check details.
+          In any case, the penalty alpha controls the relative strength of the regularization term.
+        - See https://arxiv.org/abs/2202.06417 , https://arxiv.org/abs/2210.14140
+        - In practice this method produces pretty good results, just like Min P does.
+      - **Temperature Last**: We should probably emphasize that Temperature Last is the sensible thing to do: pick the set of plausible tokens first, then tweak their
+        relative probabilities (actually logits). Don't tweak the full distribution first, and then pick the token set from that, because this tends to amplify
+        the probability of an incoherent response too much (which is what happens if Temperature Last is off).
+      - **CFG**: Context Free Guidance.
+        - Should also explain what it does... at least ooba uses CFG to control the strength of the negative prompt?
+    - *User Settings ⊳ Advanced*:
+      - **No Text Shadows**: obvious, but missing a tooltip
+      - **Visual Novel Mode**: what exactly does VN mode do, and how does it relate to group chats? What does it do in a 1-on-1 chat? Maybe needs a link to the manual, or something.
+      - **Expand Message Actions** = ??? What are message actions?
+      - **Zen Sliders** = ???
+      - **Mad Lab Mode** = ???
+      - **Message Timer**: "Time the AI's message generation, and show the duration in the chat log."
+      - **Chat Timestamps**: obvious, but missing a tooltip
+      - **Model Icons** = ???
+      - **Message IDs**: "Show message numbers in the chat log."
+      - **Message Token Count**: "Show number of tokens in each message in the chat log."
+      - **Compact Input Area** = ??? Nothing happens when toggling this on PC.
+      - **Characters Hotswap**: "In the Character Management panel, show quick selection buttons for favorited characters."
+      - **Tags as Folders** = ??? What are tags? How to use them? Link to manual?
+      - **Message Sound** = ??? Has a link to the manual, could extract a one-line summary from there.
+      - **Background Sound Only** = ???
+      - **Custom CSS** = ??? What is the scope where the custom style applies? Just MovingUI, or the whole ST GUI? Where to get an example style to learn how to make new ones?
+      - **Example Messages Behavior**: obvious, but missing a tooltip
+      - **Advanced Character Search** = ???
+      - **Never resize avatars** = ???
+      - **Show avatar filenames** = ??? This seems to affect the *Character Management* panel only, not *Character Expressions* sprites?
+      - **Import Card Tags** = ??? Something to do with the PNG character card thing?
+      - **Spoiler Free Mode** = ???
+      - **"Send" to Continue** = ??? Sending the message to the AI continues the last message instead of generating a new one? How do you generate a new one, then?
+      - **Quick "Continue" button**: "Show a button in the input area to ask the AI to continue (extend) its last message."
+      - **Swipes**: "Generate alternative responses before choosing which one to commit. Shows arrow buttons next to the AI's last message."
+      - **Gestures** = ???
+      - **Auto-load Last Chat**: obvious, but missing a tooltip
+      - **Auto-scroll Chat**: obvious, but missing a tooltip
+      - **Auto-save Message Edits** = ??? When does the autosave happen?
+      - **Confirm Message Deletion**: obvious, but missing a tooltip
+      - **Auto-fix Markdown** = ??? What exactly does it fix in Markdown, and using what algorithm?
+      - **Render Formulas**: "Render LaTeX and JSMath equation notation in chat messages."
+      - **Show {{char}}: in responses**: obvious, but missing a tooltip
+      - **Show {{user}}: in responses**: obvious, but missing a tooltip
+      - **Show tags in responses** = ???
+      - **Log prompts to console**: obvious, but missing a tooltip
+      - **Auto-swipe**: obvious once you expand the panel and look at the available settings, but missing a tooltip.
+        "Automatically reject and re-generate AI message based on configurable criteria."
+      - **Reload Chat** = ??? What exactly gets reloaded?
+    - Probably lots more. Maybe open a ticket and start fixing these?
+

 ### Low priority

@@ -64,7 +148,11 @@
 - BACKEND: Add more postprocessing filters. Possible ideas, no guarantee I'll ever get around to them:
  - Pixelize, posterize (8-bit look)
  - Analog video glitches
-    - Partition image into bands, move some left/right temporarily
+    - Partition image into bands, move some left/right temporarily (for a few frames now that we can do that)
+    - Another effect of bad VHS hsync: dynamic "bending" effect near top edge:
+      - Distortion by horizontal movement
+      - Topmost row of pixels moves the most, then a smoothly decaying offset profile as a function of height (decaying to zero at maybe 20% of image height, measured from the top)
+      - The maximum offset flutters dynamically in a semi-regular, semi-unpredictable manner (use a superposition of three sine waves at different frequencies, as functions of time)
  - Digital data connection glitches
    - Apply to random rectangles; may need to persist for a few frames to animate and/or make them more noticeable
    - May need to protect important regions like the character's head (approximately, from the template); we're after "Hollywood glitchy", not actually glitchy