mirror of
https://github.com/SillyTavern/SillyTavern-Extras.git
synced 2026-04-21 06:58:58 +00:00
Decouple animation speed (w.r.t. walltime) from render FPS
We only needed to add nonlinear step scaling to the pose interpolator; all other animation drivers (blinking, talking, breathing) were already defined in terms of wall time.
This commit is contained in:
@@ -2,136 +2,6 @@
|
||||
|
||||
### Live mode
|
||||
|
||||
- Make animation speed independent of target FPS (choppy animation is better than running slower than realtime in a realtime application)
|
||||
- Currently animation works per-frame, so it looks natural only at its design target FPS (25...30)
|
||||
- But we should also allow higher-FPS, smoother animation for users who prefer that and have the hardware to support it
|
||||
- Scale the interpolation step so that higher FPS -> smaller step (and vice versa)
|
||||
- Note the saturating exponential behavior (when target pose held constant); the step size scaling needs to be nonlinear to account for this.
|
||||
Work out the math, should be rather simple:
|
||||
- The pose interpolator is essentially an ODE solver for Newton's law of cooling, with piecewise constant loading.
|
||||
- But instead of numerical integration, we're essentially reading off points from the analytical solution curve, so it's stable regardless of step size.
|
||||
- However, the step is the distance from the current state to the final state (along the "temperature" axis in the "cooling" law), not the time.
|
||||
- Invert this relationship to find out how to scale step size to make the result behave linearly in time.
|
||||
- Then scale the "linear step" by `target_sec / reference_sec`, where `target_sec = 1 / target_fps`, and `reference_sec = 1 / reference_fps = 1 / 25`
|
||||
(or `1 / 30`, whichever looks better in practice).
|
||||
|
||||
- For an initial value `x0`, a constant final value `xf`, and constant step `dt ∈ (0, 1]`, the interpolator produces:
|
||||
|
||||
x1 = x0 + dt (xf - x0) = [1 - dt] x0 + dt xf
|
||||
x2 = x1 + dt (xf - x1) = [1 - dt] x1 + dt xf
|
||||
x3 = x2 + dt (xf - x2) = [1 - dt] x2 + dt xf
|
||||
...
|
||||
|
||||
Note that with exact arithmetic, if `dt < 1`, the final value is only reached in the limit `n → ∞`.
|
||||
For floating point, this is not the case; eventually the increment becomes small enough that when
|
||||
it is added, nothing happens. After sufficiently many steps, in practice `x` will stop just slightly
|
||||
short of `xf` (on the side it approached the target from).
|
||||
|
||||
(For performance reasons, when approaching zero, one may need to beware of denormals, because those
|
||||
are usually implemented in (slow!) software on modern CPUs. So if the target is zero, it is useful
|
||||
to have some very small cutoff (inside the normal floating-point range) after which we make `x`
|
||||
instantly jump to zero.)
|
||||
|
||||
Inserting the definition of `x1` to `x2`, we have:
|
||||
|
||||
x2 = [1 - dt] ([1 - dt] x0 + dt xf) + dt xf
|
||||
= [1 - dt]² x0 + [1 - dt] dt xf + dt xf
|
||||
= [1 - dt]² x0 + [[1 - dt] + 1] dt xf
|
||||
|
||||
Then inserting `x2` (in terms of `x0`) to `x3`:
|
||||
|
||||
x3 = [1 - dt] ([1 - dt]² x0 + [[1 - dt] + 1] dt xf) + dt xf
|
||||
= [1 - dt]³ x0 + [1 - dt]² dt xf + [1 - dt] dt xf + dt xf
|
||||
|
||||
To simplify notation, define:
|
||||
|
||||
α := 1 - dt
|
||||
β := dt xf
|
||||
|
||||
We have:
|
||||
|
||||
x1 = α x0 + β
|
||||
x2 = α² x0 + α β + β
|
||||
x3 = α³ x0 + α² β + α β + β
|
||||
|
||||
which suggests that the general pattern is (as can be proven by induction on `n`):
|
||||
|
||||
xn = α**n x0 + β ∑(α**j, j, 0, n - 1)
|
||||
|
||||
Maybe the notation is more elegant with just α?
|
||||
|
||||
x1 = α x0 + [1 - α] xf
|
||||
x2 = α² x0 + [1 - α] [1 + α] xf
|
||||
= α² x0 + [1 - α²] xf
|
||||
x3 = α³ x0 + [1 - α] [1 + α + α²] xf
|
||||
= α³ x0 + [1 - α³] xf
|
||||
...
|
||||
xn = α**n x0 + [1 - α] ∑(α**j, j, 0, n - 1) xf
|
||||
= α**n x0 + [1 - α**n] xf
|
||||
|
||||
This allows us to calculate `xn` as a function of `n`. Now the question of linear-in-time scaling becomes:
|
||||
if we want to reach a given `xn` by some given step `m` (instead of the original step `n`), how must we
|
||||
change the step size `dt` (or equivalently, `α`)?
|
||||
|
||||
Can we simplify this further? Yes:
|
||||
|
||||
x1 = α x0 + [1 - α] [[xf - x0] + x0]
|
||||
= [α + [1 - α]] x0 + [1 - α] [xf - x0]
|
||||
= x0 + [1 - α] [xf - x0]
|
||||
|
||||
Rearranging,
|
||||
|
||||
[x1 - x0] / [xf - x0] = 1 - α
|
||||
|
||||
which gives us the relative distance from `x0` to `xf` that is covered in one step. This isn't yet much
|
||||
to write home about (it's essentially just a rearrangement of the definition of `x1`), but next, let's
|
||||
treat `x2` the same way:
|
||||
|
||||
x2 = α² x0 + [1 - α] [1 + α] [[xf - x0] + x0]
|
||||
= [α² x0 + [1 - α²] x0] + [1 - α²] [xf - x0]
|
||||
= [α² + 1 - α²] x0 + [1 - α²] [xf - x0]
|
||||
= x0 + [1 - α²] [xf - x0]
|
||||
|
||||
We obtain
|
||||
|
||||
[x2 - x0] / [xf - x0] = 1 - α²
|
||||
|
||||
which is the relative distance, from the original `x0` toward the final `xf`, that is covered in two steps
|
||||
using the original step size `dt = 1 - α`. Next up, `x3`:
|
||||
|
||||
x3 = α³ x0 + [1 - α³] [[xf - x0] + x0]
|
||||
= α³ x0 + [1 - α³] [xf - x0] + [1 - α³] x0
|
||||
= x0 + [1 - α³] [xf - x0]
|
||||
|
||||
Rearranging,
|
||||
|
||||
[x3 - x0] / [xf - x0] = 1 - α³
|
||||
|
||||
which is the relative distance covered in three steps. Hence,
|
||||
|
||||
xrel := [xn - x0] / [xf - x0] = 1 - α**n
|
||||
|
||||
so that
|
||||
|
||||
α**n = 1 - xrel (**)
|
||||
|
||||
and (taking the natural logarithm of both sides)
|
||||
|
||||
n log α = log [1 - xrel]
|
||||
|
||||
or
|
||||
|
||||
n = [log [1 - xrel]] / [log α]
|
||||
|
||||
which, given `α = 1 - dt`, analytically gives the `n` where the interpolation has covered the fraction `xrel` of the original distance.
|
||||
|
||||
On the other hand, we can also solve (**) for `α`:
|
||||
|
||||
α = (1 - xrel)**(1 / n)
|
||||
|
||||
which, given desired `n`, gives us the `α` that makes the interpolation cover the fraction `xrel` of the original distance in `n` steps.
|
||||
|
||||
|
||||
- Add optional per-character configuration
|
||||
- At client end, JSON files in `SillyTavern/public/characters/characternamehere/`
|
||||
- Pass the data all the way here (from ST client, to ST server, to ST-extras server, to talkinghead module)
|
||||
|
||||
@@ -61,6 +61,8 @@ current_emotion = "neutral"
|
||||
is_talking = False
|
||||
global_reload_image = None
|
||||
|
||||
TARGET_FPS = 25
|
||||
|
||||
# --------------------------------------------------------------------------------
|
||||
# API
|
||||
|
||||
@@ -196,7 +198,7 @@ def result_feed() -> Response:
|
||||
# How often should we send?
|
||||
# - Excessive spamming can DoS the SillyTavern GUI, so there needs to be a rate limit.
|
||||
# - OTOH, we must constantly send something, or the GUI will lock up waiting.
|
||||
TARGET_FPS = 25
|
||||
# Therefore, send at a target FPS that yields a nice-looking animation.
|
||||
frame_duration_target_sec = 1 / TARGET_FPS
|
||||
if last_frame_send_complete_time is not None:
|
||||
time_now = time.time_ns()
|
||||
@@ -613,22 +615,183 @@ class Animator:
|
||||
return new_pose
|
||||
|
||||
def interpolate_pose(self, pose: List[float], target_pose: List[float], step: float = 0.1) -> List[float]:
|
||||
"""Rate-based pose integrator. Interpolate from `pose` toward `target_pose`.
|
||||
"""Interpolate from current `pose` toward `target_pose`.
|
||||
|
||||
`step`: [0, 1]; how far toward `target_pose` to interpolate. 0 is fully `pose`, 1 is fully `target_pose`.
|
||||
|
||||
Note that looping back the output as `pose`, while keeping `target_pose` constant, causes the current pose
|
||||
to approach `target_pose` on a saturating exponential trajectory, like `1 - exp(-lambda * t)`, for some
|
||||
constant `lambda`.
|
||||
|
||||
This is because `step` is the fraction of the *current* difference between `pose` and `target_pose`,
|
||||
which obviously becomes smaller after each repeat. This is a feature, not a bug!
|
||||
|
||||
This is a kind of history-free rate-based formulation, which needs only the current and target poses, and
|
||||
the step size; there is no need to keep track of e.g. the initial pose or the progress along the trajectory.
|
||||
|
||||
Note that looping back the output as `pose`, while keeping `target_pose` constant, causes the current pose
|
||||
to approach `target_pose` on a saturating trajectory. This is because `step` is the fraction of the *current*
|
||||
difference between `pose` and `target_pose`, which obviously becomes smaller after each repeat.
|
||||
|
||||
This is a feature, not a bug!
|
||||
"""
|
||||
# The `step` parameter is calibrated against animation at 25 FPS, so we must scale it appropriately, taking
|
||||
# into account the actual FPS.
|
||||
#
|
||||
# How to do this requires some explanation. Numericist hat on. Let's do a quick back-of-the-envelope calculation.
|
||||
# This pose interpolator is essentially a solver for the first-order ODE:
|
||||
#
|
||||
# u' = f(u, t)
|
||||
#
|
||||
# Consider the most common case, where the target pose remains constant over several animation frames.
|
||||
# Furthermore, consider just one morph (they all behave similarly). Then our ODE is Newton's law of cooling:
|
||||
#
|
||||
# u' = -β [u - u∞]
|
||||
#
|
||||
# where `u = u(t)` is the temperature, `u∞` is the constant temperature of the external environment,
|
||||
# and `β > 0` is a material-dependent cooling coefficient.
|
||||
#
|
||||
# But instead of numerical simulation at a constant timestep size, as would be typical in computational science,
|
||||
# we instead read off points off the analytical solution curve. The `step` parameter is *not* the timestep size;
|
||||
# instead, it controls the relative distance along the *u* axis that should be covered in one simulation step,
|
||||
# so it is actually related to the cooling coefficient β.
|
||||
#
|
||||
# (How exactly: write the left-hand side as `[unew - uold] / Δt + O([Δt]²)`, drop the error term, and decide
|
||||
# whether to use `uold` (forward Euler) or `unew` (backward Euler) as `u` on the right-hand side. Then compare
|
||||
# to our update formula. But those details don't matter here.)
|
||||
#
|
||||
# To match the notation in the rest of this code, let us denote the temperature (actually pose morph value) as `x`
|
||||
# (instead of `u`). And to keep notation shorter, let `β := step` (although it's not exactly the `β` of the
|
||||
# continuous-in-time case above).
|
||||
#
|
||||
# To scale the animation speed linearly with regard to FPS, we must invert the relation between simulation step
|
||||
# number `n` and the solution value `x`. For an initial value `x0`, a constant target value `x∞`, and constant
|
||||
# step `β ∈ (0, 1]`, the pose interpolator produces the sequence:
|
||||
#
|
||||
# x1 = x0 + β [x∞ - x0] = [1 - β] x0 + β x∞
|
||||
# x2 = x1 + β [x∞ - x1] = [1 - β] x1 + β x∞
|
||||
# x3 = x2 + β [x∞ - x2] = [1 - β] x2 + β x∞
|
||||
# ...
|
||||
#
|
||||
# Note that with exact arithmetic, if `β < 1`, the final value is only reached in the limit `n → ∞`.
|
||||
# For floating point, this is not the case. Eventually the increment becomes small enough that when
|
||||
# it is added, nothing happens. After sufficiently many steps, in practice `x` will stop just slightly
|
||||
# short of `x∞` (on the side it approached the target from).
|
||||
#
|
||||
# (For performance reasons, when approaching zero, one may need to beware of denormals, because those
|
||||
# are usually implemented in (slow!) software on modern CPUs. So especially if the target is zero,
|
||||
# it is useful to have some very small cutoff (inside the normal floating-point range) after which
|
||||
# we make `x` instantly jump to the target value.)
|
||||
#
|
||||
# Inserting the definition of `x1` to the formula for `x2`, we can express `x2` in terms of `x0` and `x∞`:
|
||||
#
|
||||
# x2 = [1 - β] ([1 - β] x0 + β x∞) + β x∞
|
||||
# = [1 - β]² x0 + [1 - β] β x∞ + β x∞
|
||||
# = [1 - β]² x0 + [[1 - β] + 1] β x∞
|
||||
#
|
||||
# Then inserting this to the formula for `x3`:
|
||||
#
|
||||
# x3 = [1 - β] ([1 - β]² x0 + [[1 - β] + 1] β x∞) + β x∞
|
||||
# = [1 - β]³ x0 + [1 - β]² β x∞ + [1 - β] β x∞ + β x∞
|
||||
#
|
||||
# To simplify notation, define:
|
||||
#
|
||||
# α := 1 - β
|
||||
#
|
||||
# We have:
|
||||
#
|
||||
# x1 = α x0 + [1 - α] x∞
|
||||
# x2 = α² x0 + [1 - α] [1 + α] x∞
|
||||
# = α² x0 + [1 - α²] x∞
|
||||
# x3 = α³ x0 + [1 - α] [1 + α + α²] x∞
|
||||
# = α³ x0 + [1 - α³] x∞
|
||||
#
|
||||
# This suggests that the general pattern is (as can be proven by induction on `n`):
|
||||
#
|
||||
# xn = α**n x0 + [1 - α**n] x∞
|
||||
#
|
||||
# This allows us to determine `x` as a function of simulation step number `n`. Now the scaling question becomes:
|
||||
# if we want to reach a given value `xn` by some given step `n_scaled` (instead of the original step `n`),
|
||||
# how must we change the step size `β` (or equivalently, the parameter `α`)?
|
||||
#
|
||||
# To simplify further, observe:
|
||||
#
|
||||
# x1 = α x0 + [1 - α] [[x∞ - x0] + x0]
|
||||
# = [α + [1 - α]] x0 + [1 - α] [x∞ - x0]
|
||||
# = x0 + [1 - α] [x∞ - x0]
|
||||
#
|
||||
# Rearranging yields:
|
||||
#
|
||||
# [x1 - x0] / [x∞ - x0] = 1 - α
|
||||
#
|
||||
# which gives us the relative distance from `x0` to `x∞` that is covered in one step. This isn't yet much
|
||||
# to write home about (it's essentially just a rearrangement of the definition of `x1`), but next, let's
|
||||
# treat `x2` the same way:
|
||||
#
|
||||
# x2 = α² x0 + [1 - α] [1 + α] [[x∞ - x0] + x0]
|
||||
# = [α² x0 + [1 - α²] x0] + [1 - α²] [x∞ - x0]
|
||||
# = [α² + 1 - α²] x0 + [1 - α²] [x∞ - x0]
|
||||
# = x0 + [1 - α²] [x∞ - x0]
|
||||
#
|
||||
# We obtain
|
||||
#
|
||||
# [x2 - x0] / [x∞ - x0] = 1 - α²
|
||||
#
|
||||
# which is the relative distance, from the original `x0` toward the final `x∞`, that is covered in two steps
|
||||
# using the original step size `β = 1 - α`. Next up, `x3`:
|
||||
#
|
||||
# x3 = α³ x0 + [1 - α³] [[x∞ - x0] + x0]
|
||||
# = α³ x0 + [1 - α³] [x∞ - x0] + [1 - α³] x0
|
||||
# = x0 + [1 - α³] [x∞ - x0]
|
||||
#
|
||||
# Rearranging,
|
||||
#
|
||||
# [x3 - x0] / [x∞ - x0] = 1 - α³
|
||||
#
|
||||
# which is the relative distance covered in three steps. Hence, we have:
|
||||
#
|
||||
# xrel := [xn - x0] / [x∞ - x0] = 1 - α**n
|
||||
#
|
||||
# so that
|
||||
#
|
||||
# α**n = 1 - xrel (**)
|
||||
#
|
||||
# and (taking the natural logarithm of both sides)
|
||||
#
|
||||
# n log α = log [1 - xrel]
|
||||
#
|
||||
# Finally,
|
||||
#
|
||||
# n = [log [1 - xrel]] / [log α]
|
||||
#
|
||||
# Given `α`, this gives the `n` where the interpolator has covered the fraction `xrel` of the original distance.
|
||||
# On the other hand, we can also solve (**) for `α`:
|
||||
#
|
||||
# α = (1 - xrel)**(1 / n)
|
||||
#
|
||||
# which, given desired `n`, gives us the `α` that makes the interpolator cover the fraction `xrel` of the original distance in `n` steps.
|
||||
#
|
||||
POSE_INTERPOLATOR_CALIBRATION_FPS = 25 # FPS for which the default value `step` was calibrated
|
||||
xrel = 0.5 # just some convenient value
|
||||
alpha_orig = 1.0 - step
|
||||
if 0 < alpha_orig < 1:
|
||||
avg_render_sec = self.render_duration_statistics.average()
|
||||
if avg_render_sec > 0:
|
||||
avg_render_fps = 1 / avg_render_sec
|
||||
# Even if render completes faster, the `talkinghead` output is rate-limited to `TARGET_FPS` at most.
|
||||
avg_render_fps = min(avg_render_fps, TARGET_FPS)
|
||||
else: # No statistics available yet; let's assume we're running at `TARGET_FPS`.
|
||||
avg_render_fps = TARGET_FPS
|
||||
|
||||
# For a constant target pose and original `α`, compute the number of animation frames to cover `xrel` of distance from initial pose to final pose.
|
||||
n_orig = math.log(1.0 - xrel) / math.log(alpha_orig)
|
||||
# Compute the scaled `n`. Note the direction: we need a smaller `n` (fewer animation frames) if the render runs slower than the calibration FPS.
|
||||
n_scaled = (avg_render_fps / POSE_INTERPOLATOR_CALIBRATION_FPS) * n_orig
|
||||
# Then compute the `α` that reaches `xrel` distance in `n_scaled` animation frames.
|
||||
alpha_scaled = (1.0 - xrel)**(1 / n_scaled)
|
||||
else: # avoid some divisions by zero at the extremes
|
||||
alpha_scaled = alpha_orig
|
||||
step_scaled = 1.0 - alpha_scaled
|
||||
|
||||
debug_fps = round(avg_render_fps, 1)
|
||||
logger.debug(f"interpolate_pose: step @ {POSE_INTERPOLATOR_CALIBRATION_FPS} FPS = {step}, scaled step @ {debug_fps:.1f} FPS = {step_scaled:0.6g}")
|
||||
|
||||
# NOTE: This overwrites blinking, talking, and breathing, but that doesn't matter, because we apply this first.
|
||||
# The other animation drivers then modify our result.
|
||||
EPSILON = 1e-8
|
||||
new_pose = list(pose) # copy
|
||||
for idx, key in enumerate(posedict_keys):
|
||||
# # We now animate blinking *after* interpolating the pose, so when blinking, the eyes close instantly.
|
||||
@@ -639,7 +802,13 @@ class Animator:
|
||||
# ...
|
||||
|
||||
delta = target_pose[idx] - pose[idx]
|
||||
new_pose[idx] = pose[idx] + step * delta
|
||||
new_pose[idx] = pose[idx] + step_scaled * delta
|
||||
|
||||
# Prevent denormal floats (which are really slow); important when running on CPU and approaching zero.
|
||||
# Our ϵ is really big compared to denormals; but there's no point in continuing to compute ever smaller
|
||||
# differences in the animated value when it has already almost (and visually, completely) reached the target.
|
||||
if abs(new_pose[idx] - target_pose[idx]) < EPSILON:
|
||||
new_pose[idx] = target_pose[idx]
|
||||
return new_pose
|
||||
|
||||
# --------------------------------------------------------------------------------
|
||||
|
||||
Reference in New Issue
Block a user