diff --git a/talkinghead/TODO.md b/talkinghead/TODO.md index 09f66ff..74de29b 100644 --- a/talkinghead/TODO.md +++ b/talkinghead/TODO.md @@ -2,11 +2,13 @@ ### High priority +- BACKEND: Postprocessor: support several effects of the same kind in the chain. + - Mostly this already works, but for those dynamic effects that use a cache, only one cache currently exists for each kind of effect, so they will step on each others' toes. + - An ID parameter, and making the cache an ID-keyed dictionary, solves this. +- BACKEND: Add configurable crop filter to trim unused space around the sides of the character, to allow better positioning of the character in **MovingUI** mode. - BACKEND: Add a server-side config for animator and postprocessor settings. - For symmetry with emotion handling; but also foreseeable that target FPS is an installation-wide thing instead of a character-wide thing. Currently we don't have a way to set it installation-wide. -- BACKEND: Fix timing of dynamic postprocessor effects, these should also use a 25 FPS reference. -- BACKEND: Add configurable crop filter to trim unused space around the sides of the character, to allow better positioning of the character in **MovingUI** mode. - FRONTEND: Check zip upload whether it refreshes the talkinghead character (it should). - FRONTEND: Switching `talkinghead` mode on/off in Character Expressions should set the expression to the current emotion. - The client *does* store the emotion, as evidenced by this quick reply STScript: @@ -35,10 +37,92 @@ ### Medium priority - FRONTEND: When a new talkinghead sprite is uploaded: - - The preview thumbnail in the client doesn't update. -- FRONTEND: Not related to talkinghead, but since I have a TODO list here: + - The preview thumbnail in the client doesn't update. (The same goes for the other sprites, so this is a general bug in *Character Expressions*.) +- FRONTEND: Not related to talkinghead, but since I have a TODO list here, I'm dumping notes on some potentially easily fixable things here instead of opening a ticket for each one: - In *Manage chat files*, when using the search feature, clicking on a search result either does nothing, - or opens the wrong chat. When not searching, clicking on a previous chat correctly opens that specific chat. + or opens the wrong chat (often the latest one, whether or not it matched the search terms). When not searching, + clicking on a previous chat correctly opens that specific chat. + - *Render Formulas* shows both the rendered formula and its plaintext. Would look better to show only the rendered formula, unless the user wants to edit it + (like the inline LaTeX equation renderer in Emacs). + - Missing tooltips: + - **MovingUI** (*User Settings ⊳ Advanced*): "Allow repositioning certain UI elements by dragging them." + - **MUI Preset** = ??? Is this a theme selector for MovingUI, affecting how the dragging GUI looks, or something else? + - **No WI/AN** (Extensions ⊳ Vector Storage ⊳ Chat vectorization settings): "Do not vectorize World Info and Author's Note." + - **Depth** (appears in many places): "How many messages before the current end of the chat." + - I think this is important to clarify, because at least to a programmer, "depth" first brings to mind nested brackets; and brackets are actually used in ST, + to make parenthetical remarks to the AI (such as for summarization: "[Pause your roleplay. Summarize...]"). + - **AI Response Configuration**: + - **Top P**: Otherwise fine, but maybe mention that Top P is also known as nucleus sampling. + - **Top A**: Relative of Min P, but operates on squared probabilities. + - See https://www.reddit.com/r/KoboldAI/comments/vcgsu1/comment/icrp0n1 + - **Tail Free Sampling**: "Estimates where the 'knee' of the next-token probability distribution is, and cuts the tail off at that point." + - I would assume the slider controls the `z` value, but this should be confirmed from the source code. + - See https://www.trentonbricken.com/Tail-Free-Sampling/ + - **Typical P** = ??? + - **Epsilon Cutoff** = ??? + - **Eta Cutoff** = ??? + - **Mirostat**: "Thermostat for output perplexity. Controls the output perplexity directly, to match the perplexity of the input. This avoids the + repetition trap (where, as the autoregressive inference produces text, the perplexity of the output tends toward zero) and the confusion + trap (where the perplexity diverges)." + - See https://arxiv.org/abs/2007.14966 + - In practice, Min P can lead to similarly good results, while being simpler and faster. Should we mention this? + - **Beam Search** = ??? + - At least it's the name of a classical optimization method in numerics. Also, in LLM sampling, beam search is infamous for its bad performance; + easily gets stuck in a repetition loop (which hints that it always picks tokens that are too probable, decreasing output perplexity). + I think this was mentioned in one of the Contrastive Search papers. + - **Contrast Search**: "The representation space of most LLMs is isotropic, and this sampler exploits that in order to encourage diversity while maintaining coherence." + - Name should be "Contrastive Search" + - In math terms, this is a minor modification to an older, standard sampling strategy. Have to re-read the paper to check details. + In any case, the penalty alpha controls the relative strength of the regularization term. + - See https://arxiv.org/abs/2202.06417 , https://arxiv.org/abs/2210.14140 + - In practice this method produces pretty good results, just like Min P does. + - **Temperature Last**: We should probably emphasize that Temperature Last is the sensible thing to do: pick the set of plausible tokens first, then tweak their + relative probabilities (actually logits). Don't tweak the full distribution first, and then pick the token set from that, because this tends to amplify + the probability of an incoherent response too much (which is what happens if Temperature Last is off). + - **CFG**: Context Free Guidance. + - Should also explain what it does... at least ooba uses CFG to control the strength of the negative prompt? + - *User Settings ⊳ Advanced*: + - **No Text Shadows**: obvious, but missing a tooltip + - **Visual Novel Mode**: what exactly does VN mode do, and how does it relate to group chats? What does it do in a 1-on-1 chat? Maybe needs a link to the manual, or something. + - **Expand Message Actions** = ??? What are message actions? + - **Zen Sliders** = ??? + - **Mad Lab Mode** = ??? + - **Message Timer**: "Time the AI's message generation, and show the duration in the chat log." + - **Chat Timestamps**: obvious, but missing a tooltip + - **Model Icons** = ??? + - **Message IDs**: "Show message numbers in the chat log." + - **Message Token Count**: "Show number of tokens in each message in the chat log." + - **Compact Input Area** = ??? Nothing happens when toggling this on PC. + - **Characters Hotswap**: "In the Character Management panel, show quick selection buttons for favorited characters." + - **Tags as Folders** = ??? What are tags? How to use them? Link to manual? + - **Message Sound** = ??? Has a link to the manual, could extract a one-line summary from there. + - **Background Sound Only** = ??? + - **Custom CSS** = ??? What is the scope where the custom style applies? Just MovingUI, or the whole ST GUI? Where to get an example style to learn how to make new ones? + - **Example Messages Behavior**: obvious, but missing a tooltip + - **Advanced Character Search** = ??? + - **Never resize avatars** = ??? + - **Show avatar filenames** = ??? This seems to affect the *Character Management* panel only, not *Character Expressions* sprites? + - **Import Card Tags** = ??? Something to do with the PNG character card thing? + - **Spoiler Free Mode** = ??? + - **"Send" to Continue** = ??? Sending the message to the AI continues the last message instead of generating a new one? How do you generate a new one, then? + - **Quick "Continue" button**: "Show a button in the input area to ask the AI to continue (extend) its last message." + - **Swipes**: "Generate alternative responses before choosing which one to commit. Shows arrow buttons next to the AI's last message." + - **Gestures** = ??? + - **Auto-load Last Chat**: obvious, but missing a tooltip + - **Auto-scroll Chat**: obvious, but missing a tooltip + - **Auto-save Message Edits** = ??? When does the autosave happen? + - **Confirm Message Deletion**: obvious, but missing a tooltip + - **Auto-fix Markdown** = ??? What exactly does it fix in Markdown, and using what algorithm? + - **Render Formulas**: "Render LaTeX and JSMath equation notation in chat messages." + - **Show {{char}}: in responses**: obvious, but missing a tooltip + - **Show {{user}}: in responses**: obvious, but missing a tooltip + - **Show tags in responses** = ??? + - **Log prompts to console**: obvious, but missing a tooltip + - **Auto-swipe**: obvious once you expand the panel and look at the available settings, but missing a tooltip. + "Automatically reject and re-generate AI message based on configurable criteria." + - **Reload Chat** = ??? What exactly gets reloaded? + - Probably lots more. Maybe open a ticket and start fixing these? + ### Low priority @@ -64,7 +148,11 @@ - BACKEND: Add more postprocessing filters. Possible ideas, no guarantee I'll ever get around to them: - Pixelize, posterize (8-bit look) - Analog video glitches - - Partition image into bands, move some left/right temporarily + - Partition image into bands, move some left/right temporarily (for a few frames now that we can do that) + - Another effect of bad VHS hsync: dynamic "bending" effect near top edge: + - Distortion by horizontal movement + - Topmost row of pixels moves the most, then a smoothly decaying offset profile as a function of height (decaying to zero at maybe 20% of image height, measured from the top) + - The maximum offset flutters dynamically in a semi-regular, semi-unpredictable manner (use a superposition of three sine waves at different frequencies, as functions of time) - Digital data connection glitches - Apply to random rectangles; may need to persist for a few frames to animate and/or make them more noticeable - May need to protect important regions like the character's head (approximately, from the template); we're after "Hollywood glitchy", not actually glitchy