Decouple animation speed (w.r.t. walltime) from render FPS

We only needed to add nonlinear step scaling to the pose interpolator; all other
animation drivers (blinking, talking, breathing) were already defined in terms
of wall time.
This commit is contained in:
Juha Jeronen
2024-01-09 10:09:42 +02:00
parent 00e7b4ed81
commit 11ff18e52e
2 changed files with 179 additions and 140 deletions

View File

@@ -2,136 +2,6 @@
### Live mode
- Make animation speed independent of target FPS (choppy animation is better than running slower than realtime in a realtime application)
- Currently animation works per-frame, so it looks natural only at its design target FPS (25...30)
- But we should also allow higher-FPS, smoother animation for users who prefer that and have the hardware to support it
- Scale the interpolation step so that higher FPS -> smaller step (and vice versa)
- Note the saturating exponential behavior (when target pose held constant); the step size scaling needs to be nonlinear to account for this.
Work out the math, should be rather simple:
- The pose interpolator is essentially an ODE solver for Newton's law of cooling, with piecewise constant loading.
- But instead of numerical integration, we're essentially reading off points from the analytical solution curve, so it's stable regardless of step size.
- However, the step is the distance from the current state to the final state (along the "temperature" axis in the "cooling" law), not the time.
- Invert this relationship to find out how to scale step size to make the result behave linearly in time.
- Then scale the "linear step" by `target_sec / reference_sec`, where `target_sec = 1 / target_fps`, and `reference_sec = 1 / reference_fps = 1 / 25`
(or `1 / 30`, whichever looks better in practice).
- For an initial value `x0`, a constant final value `xf`, and constant step `dt ∈ (0, 1]`, the interpolator produces:
x1 = x0 + dt (xf - x0) = [1 - dt] x0 + dt xf
x2 = x1 + dt (xf - x1) = [1 - dt] x1 + dt xf
x3 = x2 + dt (xf - x2) = [1 - dt] x2 + dt xf
...
Note that with exact arithmetic, if `dt < 1`, the final value is only reached in the limit `n → ∞`.
For floating point, this is not the case; eventually the increment becomes small enough that when
it is added, nothing happens. After sufficiently many steps, in practice `x` will stop just slightly
short of `xf` (on the side it approached the target from).
(For performance reasons, when approaching zero, one may need to beware of denormals, because those
are usually implemented in (slow!) software on modern CPUs. So if the target is zero, it is useful
to have some very small cutoff (inside the normal floating-point range) after which we make `x`
instantly jump to zero.)
Inserting the definition of `x1` to `x2`, we have:
x2 = [1 - dt] ([1 - dt] x0 + dt xf) + dt xf
= [1 - dt]² x0 + [1 - dt] dt xf + dt xf
= [1 - dt]² x0 + [[1 - dt] + 1] dt xf
Then inserting `x2` (in terms of `x0`) to `x3`:
x3 = [1 - dt] ([1 - dt]² x0 + [[1 - dt] + 1] dt xf) + dt xf
= [1 - dt]³ x0 + [1 - dt]² dt xf + [1 - dt] dt xf + dt xf
To simplify notation, define:
α := 1 - dt
β := dt xf
We have:
x1 = α x0 + β
x2 = α² x0 + α β + β
x3 = α³ x0 + α² β + α β + β
which suggests that the general pattern is (as can be proven by induction on `n`):
xn = α**n x0 + β ∑(α**j, j, 0, n - 1)
Maybe the notation is more elegant with just α?
x1 = α x0 + [1 - α] xf
x2 = α² x0 + [1 - α] [1 + α] xf
= α² x0 + [1 - α²] xf
x3 = α³ x0 + [1 - α] [1 + α + α²] xf
= α³ x0 + [1 - α³] xf
...
xn = α**n x0 + [1 - α] ∑(α**j, j, 0, n - 1) xf
= α**n x0 + [1 - α**n] xf
This allows us to calculate `xn` as a function of `n`. Now the question of linear-in-time scaling becomes:
if we want to reach a given `xn` by some given step `m` (instead of the original step `n`), how must we
change the step size `dt` (or equivalently, `α`)?
Can we simplify this further? Yes:
x1 = α x0 + [1 - α] [[xf - x0] + x0]
= [α + [1 - α]] x0 + [1 - α] [xf - x0]
= x0 + [1 - α] [xf - x0]
Rearranging,
[x1 - x0] / [xf - x0] = 1 - α
which gives us the relative distance from `x0` to `xf` that is covered in one step. This isn't yet much
to write home about (it's essentially just a rearrangement of the definition of `x1`), but next, let's
treat `x2` the same way:
x2 = α² x0 + [1 - α] [1 + α] [[xf - x0] + x0]
= [α² x0 + [1 - α²] x0] + [1 - α²] [xf - x0]
= [α² + 1 - α²] x0 + [1 - α²] [xf - x0]
= x0 + [1 - α²] [xf - x0]
We obtain
[x2 - x0] / [xf - x0] = 1 - α²
which is the relative distance, from the original `x0` toward the final `xf`, that is covered in two steps
using the original step size `dt = 1 - α`. Next up, `x3`:
x3 = α³ x0 + [1 - α³] [[xf - x0] + x0]
= α³ x0 + [1 - α³] [xf - x0] + [1 - α³] x0
= x0 + [1 - α³] [xf - x0]
Rearranging,
[x3 - x0] / [xf - x0] = 1 - α³
which is the relative distance covered in three steps. Hence,
xrel := [xn - x0] / [xf - x0] = 1 - α**n
so that
α**n = 1 - xrel (**)
and (taking the natural logarithm of both sides)
n log α = log [1 - xrel]
or
n = [log [1 - xrel]] / [log α]
which, given `α = 1 - dt`, analytically gives the `n` where the interpolation has covered the fraction `xrel` of the original distance.
On the other hand, we can also solve (**) for `α`:
α = (1 - xrel)**(1 / n)
which, given desired `n`, gives us the `α` that makes the interpolation cover the fraction `xrel` of the original distance in `n` steps.
- Add optional per-character configuration
- At client end, JSON files in `SillyTavern/public/characters/characternamehere/`
- Pass the data all the way here (from ST client, to ST server, to ST-extras server, to talkinghead module)

View File

@@ -61,6 +61,8 @@ current_emotion = "neutral"
is_talking = False
global_reload_image = None
TARGET_FPS = 25
# --------------------------------------------------------------------------------
# API
@@ -196,7 +198,7 @@ def result_feed() -> Response:
# How often should we send?
# - Excessive spamming can DoS the SillyTavern GUI, so there needs to be a rate limit.
# - OTOH, we must constantly send something, or the GUI will lock up waiting.
TARGET_FPS = 25
# Therefore, send at a target FPS that yields a nice-looking animation.
frame_duration_target_sec = 1 / TARGET_FPS
if last_frame_send_complete_time is not None:
time_now = time.time_ns()
@@ -613,22 +615,183 @@ class Animator:
return new_pose
def interpolate_pose(self, pose: List[float], target_pose: List[float], step: float = 0.1) -> List[float]:
"""Rate-based pose integrator. Interpolate from `pose` toward `target_pose`.
"""Interpolate from current `pose` toward `target_pose`.
`step`: [0, 1]; how far toward `target_pose` to interpolate. 0 is fully `pose`, 1 is fully `target_pose`.
Note that looping back the output as `pose`, while keeping `target_pose` constant, causes the current pose
to approach `target_pose` on a saturating exponential trajectory, like `1 - exp(-lambda * t)`, for some
constant `lambda`.
This is because `step` is the fraction of the *current* difference between `pose` and `target_pose`,
which obviously becomes smaller after each repeat. This is a feature, not a bug!
This is a kind of history-free rate-based formulation, which needs only the current and target poses, and
the step size; there is no need to keep track of e.g. the initial pose or the progress along the trajectory.
Note that looping back the output as `pose`, while keeping `target_pose` constant, causes the current pose
to approach `target_pose` on a saturating trajectory. This is because `step` is the fraction of the *current*
difference between `pose` and `target_pose`, which obviously becomes smaller after each repeat.
This is a feature, not a bug!
"""
# The `step` parameter is calibrated against animation at 25 FPS, so we must scale it appropriately, taking
# into account the actual FPS.
#
# How to do this requires some explanation. Numericist hat on. Let's do a quick back-of-the-envelope calculation.
# This pose interpolator is essentially a solver for the first-order ODE:
#
# u' = f(u, t)
#
# Consider the most common case, where the target pose remains constant over several animation frames.
# Furthermore, consider just one morph (they all behave similarly). Then our ODE is Newton's law of cooling:
#
# u' = -β [u - u∞]
#
# where `u = u(t)` is the temperature, `u∞` is the constant temperature of the external environment,
# and `β > 0` is a material-dependent cooling coefficient.
#
# But instead of numerical simulation at a constant timestep size, as would be typical in computational science,
# we instead read off points off the analytical solution curve. The `step` parameter is *not* the timestep size;
# instead, it controls the relative distance along the *u* axis that should be covered in one simulation step,
# so it is actually related to the cooling coefficient β.
#
# (How exactly: write the left-hand side as `[unew - uold] / Δt + O([Δt]²)`, drop the error term, and decide
# whether to use `uold` (forward Euler) or `unew` (backward Euler) as `u` on the right-hand side. Then compare
# to our update formula. But those details don't matter here.)
#
# To match the notation in the rest of this code, let us denote the temperature (actually pose morph value) as `x`
# (instead of `u`). And to keep notation shorter, let `β := step` (although it's not exactly the `β` of the
# continuous-in-time case above).
#
# To scale the animation speed linearly with regard to FPS, we must invert the relation between simulation step
# number `n` and the solution value `x`. For an initial value `x0`, a constant target value `x∞`, and constant
# step `β ∈ (0, 1]`, the pose interpolator produces the sequence:
#
# x1 = x0 + β [x∞ - x0] = [1 - β] x0 + β x∞
# x2 = x1 + β [x∞ - x1] = [1 - β] x1 + β x∞
# x3 = x2 + β [x∞ - x2] = [1 - β] x2 + β x∞
# ...
#
# Note that with exact arithmetic, if `β < 1`, the final value is only reached in the limit `n → ∞`.
# For floating point, this is not the case. Eventually the increment becomes small enough that when
# it is added, nothing happens. After sufficiently many steps, in practice `x` will stop just slightly
# short of `x∞` (on the side it approached the target from).
#
# (For performance reasons, when approaching zero, one may need to beware of denormals, because those
# are usually implemented in (slow!) software on modern CPUs. So especially if the target is zero,
# it is useful to have some very small cutoff (inside the normal floating-point range) after which
# we make `x` instantly jump to the target value.)
#
# Inserting the definition of `x1` to the formula for `x2`, we can express `x2` in terms of `x0` and `x∞`:
#
# x2 = [1 - β] ([1 - β] x0 + β x∞) + β x∞
# = [1 - β]² x0 + [1 - β] β x∞ + β x∞
# = [1 - β]² x0 + [[1 - β] + 1] β x∞
#
# Then inserting this to the formula for `x3`:
#
# x3 = [1 - β] ([1 - β]² x0 + [[1 - β] + 1] β x∞) + β x∞
# = [1 - β]³ x0 + [1 - β]² β x∞ + [1 - β] β x∞ + β x∞
#
# To simplify notation, define:
#
# α := 1 - β
#
# We have:
#
# x1 = α x0 + [1 - α] x∞
# x2 = α² x0 + [1 - α] [1 + α] x∞
# = α² x0 + [1 - α²] x∞
# x3 = α³ x0 + [1 - α] [1 + α + α²] x∞
# = α³ x0 + [1 - α³] x∞
#
# This suggests that the general pattern is (as can be proven by induction on `n`):
#
# xn = α**n x0 + [1 - α**n] x∞
#
# This allows us to determine `x` as a function of simulation step number `n`. Now the scaling question becomes:
# if we want to reach a given value `xn` by some given step `n_scaled` (instead of the original step `n`),
# how must we change the step size `β` (or equivalently, the parameter `α`)?
#
# To simplify further, observe:
#
# x1 = α x0 + [1 - α] [[x∞ - x0] + x0]
# = [α + [1 - α]] x0 + [1 - α] [x∞ - x0]
# = x0 + [1 - α] [x∞ - x0]
#
# Rearranging yields:
#
# [x1 - x0] / [x∞ - x0] = 1 - α
#
# which gives us the relative distance from `x0` to `x∞` that is covered in one step. This isn't yet much
# to write home about (it's essentially just a rearrangement of the definition of `x1`), but next, let's
# treat `x2` the same way:
#
# x2 = α² x0 + [1 - α] [1 + α] [[x∞ - x0] + x0]
# = [α² x0 + [1 - α²] x0] + [1 - α²] [x∞ - x0]
# = [α² + 1 - α²] x0 + [1 - α²] [x∞ - x0]
# = x0 + [1 - α²] [x∞ - x0]
#
# We obtain
#
# [x2 - x0] / [x∞ - x0] = 1 - α²
#
# which is the relative distance, from the original `x0` toward the final `x∞`, that is covered in two steps
# using the original step size `β = 1 - α`. Next up, `x3`:
#
# x3 = α³ x0 + [1 - α³] [[x∞ - x0] + x0]
# = α³ x0 + [1 - α³] [x∞ - x0] + [1 - α³] x0
# = x0 + [1 - α³] [x∞ - x0]
#
# Rearranging,
#
# [x3 - x0] / [x∞ - x0] = 1 - α³
#
# which is the relative distance covered in three steps. Hence, we have:
#
# xrel := [xn - x0] / [x∞ - x0] = 1 - α**n
#
# so that
#
# α**n = 1 - xrel (**)
#
# and (taking the natural logarithm of both sides)
#
# n log α = log [1 - xrel]
#
# Finally,
#
# n = [log [1 - xrel]] / [log α]
#
# Given `α`, this gives the `n` where the interpolator has covered the fraction `xrel` of the original distance.
# On the other hand, we can also solve (**) for `α`:
#
# α = (1 - xrel)**(1 / n)
#
# which, given desired `n`, gives us the `α` that makes the interpolator cover the fraction `xrel` of the original distance in `n` steps.
#
POSE_INTERPOLATOR_CALIBRATION_FPS = 25 # FPS for which the default value `step` was calibrated
xrel = 0.5 # just some convenient value
alpha_orig = 1.0 - step
if 0 < alpha_orig < 1:
avg_render_sec = self.render_duration_statistics.average()
if avg_render_sec > 0:
avg_render_fps = 1 / avg_render_sec
# Even if render completes faster, the `talkinghead` output is rate-limited to `TARGET_FPS` at most.
avg_render_fps = min(avg_render_fps, TARGET_FPS)
else: # No statistics available yet; let's assume we're running at `TARGET_FPS`.
avg_render_fps = TARGET_FPS
# For a constant target pose and original `α`, compute the number of animation frames to cover `xrel` of distance from initial pose to final pose.
n_orig = math.log(1.0 - xrel) / math.log(alpha_orig)
# Compute the scaled `n`. Note the direction: we need a smaller `n` (fewer animation frames) if the render runs slower than the calibration FPS.
n_scaled = (avg_render_fps / POSE_INTERPOLATOR_CALIBRATION_FPS) * n_orig
# Then compute the `α` that reaches `xrel` distance in `n_scaled` animation frames.
alpha_scaled = (1.0 - xrel)**(1 / n_scaled)
else: # avoid some divisions by zero at the extremes
alpha_scaled = alpha_orig
step_scaled = 1.0 - alpha_scaled
debug_fps = round(avg_render_fps, 1)
logger.debug(f"interpolate_pose: step @ {POSE_INTERPOLATOR_CALIBRATION_FPS} FPS = {step}, scaled step @ {debug_fps:.1f} FPS = {step_scaled:0.6g}")
# NOTE: This overwrites blinking, talking, and breathing, but that doesn't matter, because we apply this first.
# The other animation drivers then modify our result.
EPSILON = 1e-8
new_pose = list(pose) # copy
for idx, key in enumerate(posedict_keys):
# # We now animate blinking *after* interpolating the pose, so when blinking, the eyes close instantly.
@@ -639,7 +802,13 @@ class Animator:
# ...
delta = target_pose[idx] - pose[idx]
new_pose[idx] = pose[idx] + step * delta
new_pose[idx] = pose[idx] + step_scaled * delta
# Prevent denormal floats (which are really slow); important when running on CPU and approaching zero.
# Our ϵ is really big compared to denormals; but there's no point in continuing to compute ever smaller
# differences in the animated value when it has already almost (and visually, completely) reached the target.
if abs(new_pose[idx] - target_pose[idx]) < EPSILON:
new_pose[idx] = target_pose[idx]
return new_pose
# --------------------------------------------------------------------------------