Decouple animation speed (w.r.t. walltime) from render FPS

We only needed to add nonlinear step scaling to the pose interpolator; all other animation drivers (blinking, talking, breathing) were already defined in terms of wall time.
2026-04-21 06:58:58 +00:00 · 2024-01-09 10:09:42 +02:00
parent 00e7b4ed81
commit 11ff18e52e
2 changed files with 179 additions and 140 deletions
--- a/talkinghead/TODO.md
+++ b/talkinghead/TODO.md
@@ -2,136 +2,6 @@

 ### Live mode

- Make animation speed independent of target FPS (choppy animation is better than running slower than realtime in a realtime application)
-  - Currently animation works per-frame, so it looks natural only at its design target FPS (25...30)
-  - But we should also allow higher-FPS, smoother animation for users who prefer that and have the hardware to support it
-  - Scale the interpolation step so that higher FPS -> smaller step (and vice versa)
-    - Note the saturating exponential behavior (when target pose held constant); the step size scaling needs to be nonlinear to account for this.
-      Work out the math, should be rather simple:
-      - The pose interpolator is essentially an ODE solver for Newton's law of cooling, with piecewise constant loading.
-      - But instead of numerical integration, we're essentially reading off points from the analytical solution curve, so it's stable regardless of step size.
-      - However, the step is the distance from the current state to the final state (along the "temperature" axis in the "cooling" law), not the time.
-      - Invert this relationship to find out how to scale step size to make the result behave linearly in time.
-      - Then scale the "linear step" by `target_sec / reference_sec`, where `target_sec = 1 / target_fps`, and `reference_sec = 1 / reference_fps = 1 / 25`
-        (or `1 / 30`, whichever looks better in practice).
-
-      - For an initial value `x0`, a constant final value `xf`, and constant step `dt ∈ (0, 1]`, the interpolator produces:
-
-        x1 = x0 + dt (xf - x0) = [1 - dt] x0 + dt xf
-        x2 = x1 + dt (xf - x1) = [1 - dt] x1 + dt xf
-        x3 = x2 + dt (xf - x2) = [1 - dt] x2 + dt xf
-        ...
-
-        Note that with exact arithmetic, if `dt < 1`, the final value is only reached in the limit `n → ∞`.
-        For floating point, this is not the case; eventually the increment becomes small enough that when
-        it is added, nothing happens. After sufficiently many steps, in practice `x` will stop just slightly
-        short of `xf` (on the side it approached the target from).
-
-        (For performance reasons, when approaching zero, one may need to beware of denormals, because those
-         are usually implemented in (slow!) software on modern CPUs. So if the target is zero, it is useful
-         to have some very small cutoff (inside the normal floating-point range) after which we make `x`
-         instantly jump to zero.)
-
-        Inserting the definition of `x1` to `x2`, we have:
-
-        x2 = [1 - dt] ([1 - dt] x0 + dt xf) + dt xf
-           = [1 - dt]² x0 + [1 - dt] dt xf + dt xf
-           = [1 - dt]² x0 + [[1 - dt] + 1] dt xf
-
-        Then inserting `x2` (in terms of `x0`) to `x3`:
-
-        x3 = [1 - dt] ([1 - dt]² x0 + [[1 - dt] + 1] dt xf) + dt xf
-           = [1 - dt]³ x0 + [1 - dt]² dt xf + [1 - dt] dt xf + dt xf
-
-        To simplify notation, define:
-
-        α := 1 - dt
-        β := dt xf
-
-        We have:
-
-        x1 = α  x0 + β
-        x2 = α² x0 + α β + β
-        x3 = α³ x0 + α² β + α β + β
-
-        which suggests that the general pattern is (as can be proven by induction on `n`):
-
-        xn = α**n x0 + β ∑(α**j, j, 0, n - 1)
-
-        Maybe the notation is more elegant with just α?
-
-        x1 = α  x0 + [1 - α] xf
-        x2 = α² x0 + [1 - α] [1 + α] xf
-           = α² x0 + [1 - α²] xf
-        x3 = α³ x0 + [1 - α] [1 + α + α²] xf
-           = α³ x0 + [1 - α³] xf
-        ...
-        xn = α**n x0 + [1 - α] ∑(α**j, j, 0, n - 1) xf
-           = α**n x0 + [1 - α**n] xf
-
-        This allows us to calculate `xn` as a function of `n`. Now the question of linear-in-time scaling becomes:
-        if we want to reach a given `xn` by some given step `m` (instead of the original step `n`), how must we
-        change the step size `dt` (or equivalently, `α`)?
-
-        Can we simplify this further? Yes:
-
-        x1 = α x0 + [1 - α] [[xf - x0] + x0]
-           = [α + [1 - α]] x0 + [1 - α] [xf - x0]
-           = x0 + [1 - α] [xf - x0]
-
-        Rearranging,
-
-        [x1 - x0] / [xf - x0] = 1 - α
-
-        which gives us the relative distance from `x0` to `xf` that is covered in one step. This isn't yet much
-        to write home about (it's essentially just a rearrangement of the definition of `x1`), but next, let's
-        treat `x2` the same way:
-
-        x2 = α² x0 + [1 - α] [1 + α] [[xf - x0] + x0]
-           = [α² x0 + [1 - α²] x0] + [1 - α²] [xf - x0]
-           = [α² + 1 - α²] x0 + [1 - α²] [xf - x0]
-           = x0 + [1 - α²] [xf - x0]
-
-        We obtain
-
-        [x2 - x0] / [xf - x0] = 1 - α²
-
-        which is the relative distance, from the original `x0` toward the final `xf`, that is covered in two steps
-        using the original step size `dt = 1 - α`. Next up, `x3`:
-
-        x3 = α³ x0 + [1 - α³] [[xf - x0] + x0]
-           = α³ x0 + [1 - α³] [xf - x0] + [1 - α³] x0
-           = x0 + [1 - α³] [xf - x0]
-
-        Rearranging,
-
-        [x3 - x0] / [xf - x0] = 1 - α³
-
-        which is the relative distance covered in three steps. Hence,
-
-        xrel := [xn - x0] / [xf - x0] = 1 - α**n
-
-        so that
-
-        α**n = 1 - xrel              (**)
-
-        and (taking the natural logarithm of both sides)
-
-        n log α = log [1 - xrel]
-
-        or
-
-        n = [log [1 - xrel]] / [log α]
-
-        which, given `α = 1 - dt`, analytically gives the `n` where the interpolation has covered the fraction `xrel` of the original distance.
-
-        On the other hand, we can also solve (**) for `α`:
-
-        α = (1 - xrel)**(1 / n)
-
-        which, given desired `n`, gives us the `α` that makes the interpolation cover the fraction `xrel` of the original distance in `n` steps.
-
-
 - Add optional per-character configuration
  - At client end, JSON files in `SillyTavern/public/characters/characternamehere/`
  - Pass the data all the way here (from ST client, to ST server, to ST-extras server, to talkinghead module)
--- a/talkinghead/tha3/app/app.py
+++ b/talkinghead/tha3/app/app.py
@@ -61,6 +61,8 @@ current_emotion = "neutral"
 is_talking = False
 global_reload_image = None

+TARGET_FPS = 25
+
 # --------------------------------------------------------------------------------
 # API

@@ -196,7 +198,7 @@ def result_feed() -> Response:
                # How often should we send?
                #  - Excessive spamming can DoS the SillyTavern GUI, so there needs to be a rate limit.
                #  - OTOH, we must constantly send something, or the GUI will lock up waiting.
-                TARGET_FPS = 25
+                # Therefore, send at a target FPS that yields a nice-looking animation.
                frame_duration_target_sec = 1 / TARGET_FPS
                if last_frame_send_complete_time is not None:
                    time_now = time.time_ns()
@@ -613,22 +615,183 @@ class Animator:
        return new_pose

    def interpolate_pose(self, pose: List[float], target_pose: List[float], step: float = 0.1) -> List[float]:
-        """Rate-based pose integrator. Interpolate from `pose` toward `target_pose`.
+        """Interpolate from current `pose` toward `target_pose`.

        `step`: [0, 1]; how far toward `target_pose` to interpolate. 0 is fully `pose`, 1 is fully `target_pose`.

-        Note that looping back the output as `pose`, while keeping `target_pose` constant, causes the current pose
-        to approach `target_pose` on a saturating exponential trajectory, like `1 - exp(-lambda * t)`, for some
-        constant `lambda`.
-
-        This is because `step` is the fraction of the *current* difference between `pose` and `target_pose`,
-        which obviously becomes smaller after each repeat. This is a feature, not a bug!
-
        This is a kind of history-free rate-based formulation, which needs only the current and target poses, and
        the step size; there is no need to keep track of e.g. the initial pose or the progress along the trajectory.
+
+        Note that looping back the output as `pose`, while keeping `target_pose` constant, causes the current pose
+        to approach `target_pose` on a saturating trajectory. This is because `step` is the fraction of the *current*
+        difference between `pose` and `target_pose`, which obviously becomes smaller after each repeat.
+
+        This is a feature, not a bug!
        """
+        # The `step` parameter is calibrated against animation at 25 FPS, so we must scale it appropriately, taking
+        # into account the actual FPS.
+        #
+        # How to do this requires some explanation. Numericist hat on. Let's do a quick back-of-the-envelope calculation.
+        # This pose interpolator is essentially a solver for the first-order ODE:
+        #
+        #   u' = f(u, t)
+        #
+        # Consider the most common case, where the target pose remains constant over several animation frames.
+        # Furthermore, consider just one morph (they all behave similarly). Then our ODE is Newton's law of cooling:
+        #
+        #   u' = -β [u - u∞]
+        #
+        # where `u = u(t)` is the temperature, `u∞` is the constant temperature of the external environment,
+        # and `β > 0` is a material-dependent cooling coefficient.
+        #
+        # But instead of numerical simulation at a constant timestep size, as would be typical in computational science,
+        # we instead read off points off the analytical solution curve. The `step` parameter is *not* the timestep size;
+        # instead, it controls the relative distance along the *u* axis that should be covered in one simulation step,
+        # so it is actually related to the cooling coefficient β.
+        #
+        # (How exactly: write the left-hand side as `[unew - uold] / Δt + O([Δt]²)`, drop the error term, and decide
+        #  whether to use `uold` (forward Euler) or `unew` (backward Euler) as `u` on the right-hand side. Then compare
+        #  to our update formula. But those details don't matter here.)
+        #
+        # To match the notation in the rest of this code, let us denote the temperature (actually pose morph value) as `x`
+        # (instead of `u`). And to keep notation shorter, let `β := step` (although it's not exactly the `β` of the
+        # continuous-in-time case above).
+        #
+        # To scale the animation speed linearly with regard to FPS, we must invert the relation between simulation step
+        # number `n` and the solution value `x`. For an initial value `x0`, a constant target value `x∞`, and constant
+        # step `β ∈ (0, 1]`, the pose interpolator produces the sequence:
+        #
+        #   x1 = x0 + β [x∞ - x0] = [1 - β] x0 + β x∞
+        #   x2 = x1 + β [x∞ - x1] = [1 - β] x1 + β x∞
+        #   x3 = x2 + β [x∞ - x2] = [1 - β] x2 + β x∞
+        #   ...
+        #
+        # Note that with exact arithmetic, if `β < 1`, the final value is only reached in the limit `n → ∞`.
+        # For floating point, this is not the case. Eventually the increment becomes small enough that when
+        # it is added, nothing happens. After sufficiently many steps, in practice `x` will stop just slightly
+        # short of `x∞` (on the side it approached the target from).
+        #
+        # (For performance reasons, when approaching zero, one may need to beware of denormals, because those
+        #  are usually implemented in (slow!) software on modern CPUs. So especially if the target is zero,
+        #  it is useful to have some very small cutoff (inside the normal floating-point range) after which
+        #  we make `x` instantly jump to the target value.)
+        #
+        # Inserting the definition of `x1` to the formula for `x2`, we can express `x2` in terms of `x0` and `x∞`:
+        #
+        #   x2 = [1 - β] ([1 - β] x0 + β x∞) + β x∞
+        #      = [1 - β]² x0 + [1 - β] β x∞ + β x∞
+        #      = [1 - β]² x0 + [[1 - β] + 1] β x∞
+        #
+        # Then inserting this to the formula for `x3`:
+        #
+        #   x3 = [1 - β] ([1 - β]² x0 + [[1 - β] + 1] β x∞) + β x∞
+        #      = [1 - β]³ x0 + [1 - β]² β x∞ + [1 - β] β x∞ + β x∞
+        #
+        # To simplify notation, define:
+        #
+        #   α := 1 - β
+        #
+        # We have:
+        #
+        #   x1 = α  x0 + [1 - α] x∞
+        #   x2 = α² x0 + [1 - α] [1 + α] x∞
+        #      = α² x0 + [1 - α²] x∞
+        #   x3 = α³ x0 + [1 - α] [1 + α + α²] x∞
+        #      = α³ x0 + [1 - α³] x∞
+        #
+        # This suggests that the general pattern is (as can be proven by induction on `n`):
+        #
+        #   xn = α**n x0 + [1 - α**n] x∞
+        #
+        # This allows us to determine `x` as a function of simulation step number `n`. Now the scaling question becomes:
+        # if we want to reach a given value `xn` by some given step `n_scaled` (instead of the original step `n`),
+        # how must we change the step size `β` (or equivalently, the parameter `α`)?
+        #
+        # To simplify further, observe:
+        #
+        #   x1 = α x0 + [1 - α] [[x∞ - x0] + x0]
+        #      = [α + [1 - α]] x0 + [1 - α] [x∞ - x0]
+        #      = x0 + [1 - α] [x∞ - x0]
+        #
+        # Rearranging yields:
+        #
+        #   [x1 - x0] / [x∞ - x0] = 1 - α
+        #
+        # which gives us the relative distance from `x0` to `x∞` that is covered in one step. This isn't yet much
+        # to write home about (it's essentially just a rearrangement of the definition of `x1`), but next, let's
+        # treat `x2` the same way:
+        #
+        #   x2 = α² x0 + [1 - α] [1 + α] [[x∞ - x0] + x0]
+        #      = [α² x0 + [1 - α²] x0] + [1 - α²] [x∞ - x0]
+        #      = [α² + 1 - α²] x0 + [1 - α²] [x∞ - x0]
+        #      = x0 + [1 - α²] [x∞ - x0]
+        #
+        # We obtain
+        #
+        #   [x2 - x0] / [x∞ - x0] = 1 - α²
+        #
+        # which is the relative distance, from the original `x0` toward the final `x∞`, that is covered in two steps
+        # using the original step size `β = 1 - α`. Next up, `x3`:
+        #
+        #   x3 = α³ x0 + [1 - α³] [[x∞ - x0] + x0]
+        #      = α³ x0 + [1 - α³] [x∞ - x0] + [1 - α³] x0
+        #      = x0 + [1 - α³] [x∞ - x0]
+        #
+        # Rearranging,
+        #
+        #   [x3 - x0] / [x∞ - x0] = 1 - α³
+        #
+        # which is the relative distance covered in three steps. Hence, we have:
+        #
+        #   xrel := [xn - x0] / [x∞ - x0] = 1 - α**n
+        #
+        # so that
+        #
+        #   α**n = 1 - xrel              (**)
+        #
+        # and (taking the natural logarithm of both sides)
+        #
+        #   n log α = log [1 - xrel]
+        #
+        # Finally,
+        #
+        #   n = [log [1 - xrel]] / [log α]
+        #
+        # Given `α`, this gives the `n` where the interpolator has covered the fraction `xrel` of the original distance.
+        # On the other hand, we can also solve (**) for `α`:
+        #
+        #   α = (1 - xrel)**(1 / n)
+        #
+        # which, given desired `n`, gives us the `α` that makes the interpolator cover the fraction `xrel` of the original distance in `n` steps.
+        #
+        POSE_INTERPOLATOR_CALIBRATION_FPS = 25  # FPS for which the default value `step` was calibrated
+        xrel = 0.5  # just some convenient value
+        alpha_orig = 1.0 - step
+        if 0 < alpha_orig < 1:
+            avg_render_sec = self.render_duration_statistics.average()
+            if avg_render_sec > 0:
+                avg_render_fps = 1 / avg_render_sec
+                # Even if render completes faster, the `talkinghead` output is rate-limited to `TARGET_FPS` at most.
+                avg_render_fps = min(avg_render_fps, TARGET_FPS)
+            else:  # No statistics available yet; let's assume we're running at `TARGET_FPS`.
+                avg_render_fps = TARGET_FPS
+
+            # For a constant target pose and original `α`, compute the number of animation frames to cover `xrel` of distance from initial pose to final pose.
+            n_orig = math.log(1.0 - xrel) / math.log(alpha_orig)
+            # Compute the scaled `n`. Note the direction: we need a smaller `n` (fewer animation frames) if the render runs slower than the calibration FPS.
+            n_scaled = (avg_render_fps / POSE_INTERPOLATOR_CALIBRATION_FPS) * n_orig
+            # Then compute the `α` that reaches `xrel` distance in `n_scaled` animation frames.
+            alpha_scaled = (1.0 - xrel)**(1 / n_scaled)
+        else:  # avoid some divisions by zero at the extremes
+            alpha_scaled = alpha_orig
+        step_scaled = 1.0 - alpha_scaled
+
+        debug_fps = round(avg_render_fps, 1)
+        logger.debug(f"interpolate_pose: step @ {POSE_INTERPOLATOR_CALIBRATION_FPS} FPS = {step}, scaled step @ {debug_fps:.1f} FPS = {step_scaled:0.6g}")
+
        # NOTE: This overwrites blinking, talking, and breathing, but that doesn't matter, because we apply this first.
        # The other animation drivers then modify our result.
+        EPSILON = 1e-8
        new_pose = list(pose)  # copy
        for idx, key in enumerate(posedict_keys):
            # # We now animate blinking *after* interpolating the pose, so when blinking, the eyes close instantly.
@@ -639,7 +802,13 @@ class Animator:
            #     ...

            delta = target_pose[idx] - pose[idx]
-            new_pose[idx] = pose[idx] + step * delta
+            new_pose[idx] = pose[idx] + step_scaled * delta
+
+            # Prevent denormal floats (which are really slow); important when running on CPU and approaching zero.
+            # Our ϵ is really big compared to denormals; but there's no point in continuing to compute ever smaller
+            # differences in the animated value when it has already almost (and visually, completely) reached the target.
+            if abs(new_pose[idx] - target_pose[idx]) < EPSILON:
+                new_pose[idx] = target_pose[idx]
        return new_pose

    # --------------------------------------------------------------------------------