postproc: brightness filters, without affecting translucency

2026-04-24 08:28:59 +00:00 · 2024-01-20 02:45:52 +02:00
parent 16c341f334
commit cbb3d060be
3 changed files with 70 additions and 27 deletions
--- a/talkinghead/README.md
+++ b/talkinghead/README.md
@@ -240,25 +240,27 @@ Currently, we provide some filters that simulate a lo-fi analog video look.
 **General use**:

 - `alphanoise`: Adds noise to the alpha channel (translucency).
+- `lumanoise`: Adds noise to the brightness (luminance).
 - `desaturate`: A desaturation filter with bells and whistles. Beside converting the image to grayscale, can optionally pass through colors that match the hue of a given RGB color (e.g. keep red things, while desaturating the rest), and tint the final result (e.g. for an amber monochrome computer monitor look).

-The `alphanoise` filter could represent the display of a lo-fi scifi hologram, as well as noise in an analog video tape (which in this scheme belongs to "transport").
+The noise filters could represent the display of a lo-fi scifi hologram, as well as noise in an analog video tape (which in this scheme belongs to "transport").

 The `desaturate` filter could represent either a black and white video camera, or a monochrome display.

 #### Postprocessor example: HDR, scifi hologram

-The bloom works best on a dark background. We use `alphanoise` to add an imperfection to the simulated display device, causing individual pixels to dynamically vary in their alpha. The `banding` and `scanlines` filters complete the look of how holograms are often depicted in scifi video games and movies. The `"dynamic": true` makes the dimmed field (top or bottom) flip each frame, like on a CRT television.
+The bloom works best on a dark background. We use `lumanoise` to add an imperfection to the simulated display device, causing individual pixels to dynamically vary in their brightness (luminance). The `banding` and `scanlines` filters complete the look of how holograms are often depicted in scifi video games and movies. The `"dynamic": true` makes the dimmed field (top or bottom) flip each frame, like on a CRT television, and `"channel": "A"` applies the effect to the alpha channel, making the "hologram" translucent. (The default is `"channel": "Y"`, affecting the brightness, but not translucency.)

 ```
 "postprocessor_chain": [["bloom", {}],
-                        ["translucency", {"alpha": 0.9}],
-                        ["alphanoise", {"magnitude": 0.1, "sigma": 0.0}],
+                        ["lumanoise", {"magnitude": 0.1, "sigma": 0.0}],
                        ["banding", {}],
-                        ["scanlines", {"dynamic": true}]
+                        ["scanlines", {"dynamic": true, "channel": "A"}]
                       ]
 ```

+Note that we could also use the `translucency` filter to make the character translucent, e.g.: `["translucency", {"alpha": 0.7}]`.
+
 #### Postprocessor example: cheap video camera, amber monochrome computer monitor

 We first simulate a cheap video camera with low-quality optics via the `chromatic_aberration` and `vignetting` filters.
@@ -272,7 +274,7 @@ The `banding` and `scanlines` filters suit this look, so we apply them here, too
                        ["vignetting", {}],
                        ["desaturate", {"tint_rgb": [1.0, 0.5, 0.2]}],
                        ["banding", {}],
-                        ["scanlines", {"dynamic": false}]
+                        ["scanlines", {"dynamic": false, "channel": "A"}]
                       ]
 ```

@@ -284,15 +286,13 @@ Then we again render the output on a simulated CRT TV, as appropriate for the 19

 ```
 "postprocessor_chain": [["bloom", {}],
-                        ["chromatic_aberration", {}],
-                        ["vignetting", {}],
                        ["analog_lowres", {}],
-                        ["alphanoise", {"magnitude": 0.3, "sigma": 2.0}],
+                        ["lumanoise", {"magnitude": 0.3, "sigma": 2.0}],
                        ["analog_badhsync", {}],
                        ["analog_vhsglitches", {"unboost": 1.0}],
                        ["analog_vhstracking", {}],
                        ["banding", {}],
-                        ["scanlines", {"dynamic": true}]
+                        ["scanlines", {"dynamic": true, "channel": "A"}]
                       ]
 ```

--- a/talkinghead/TODO.md
+++ b/talkinghead/TODO.md
@@ -5,16 +5,6 @@

 As of January 2024, preferably to be completed before the next release.

-#### Backend
-
- Postprocessor: make real brightness filters, to decouple translucency from all other filters.
-  - Currently many of the filters abuse the alpha channel as a luma substitute, which looks fine for a scifi hologram, but not for some other use cases.
-  - Need to convert between RGB and some other color space. Preferably not YUV, since that doesn't map so well to RGB and back.
-      https://stackoverflow.com/questions/17892346/how-to-convert-rgb-yuv-rgb-both-ways
-      https://www.cs.sfu.ca/mmbook/programming_assignments/additional_notes/rgb_yuv_note/RGB-YUV.pdf
-  - Maybe HSL, or HCL, or a combined strategy from both, like in this R package:
-      https://colorspace.r-forge.r-project.org/articles/manipulation_utilities.html
-
 #### Frontend

 - Figure out why the crop filter doesn't help in positioning the `talkinghead` sprite in *MovingUI* mode.
--- a/talkinghead/tha3/app/postprocessor.py
+++ b/talkinghead/tha3/app/postprocessor.py
@@ -14,7 +14,7 @@ from typing import Dict, List, Optional, Tuple, TypeVar, Union
 import torch
 import torchvision

-from tha3.app.util import RunningAverage
+from tha3.app.util import RunningAverage, luminance, rgb_to_yuv, yuv_to_rgb

 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
@@ -119,6 +119,7 @@ class Postprocessor:

        # Caches for individual dynamic effects
        self.alphanoise_last_image = defaultdict(lambda: None)
+        self.lumanoise_last_image = defaultdict(lambda: None)
        self.vhs_glitch_interval = defaultdict(lambda: 0.0)
        self.vhs_glitch_last_frame_no = defaultdict(lambda: 0.0)
        self.vhs_glitch_last_image = defaultdict(lambda: None)
@@ -233,6 +234,7 @@ class Postprocessor:
        Only makes sense when the talkinghead is rendered on a dark-ish background.

        `luma_threshold`: How bright is bright. 0.0 is full black, 1.0 is full white.
+                          (Technically, true relative luminance, not luma, since we work in linear RGB space.)
        `hdr_exposure`: Controls the overall brightness of the output. Like in photography,
                        higher exposure means brighter image (saturating toward white).
        """
@@ -240,7 +242,7 @@ class Postprocessor:
        #   https://learnopengl.com/Advanced-Lighting/Bloom

        # Find the bright parts.
-        Y = 0.2126 * image[0, :, :] + 0.7152 * image[1, :, :] + 0.0722 * image[2, :, :]  # HDTV luminance (ITU-R Rec. 709)
+        Y = luminance(image[:3, :, :])
        mask = torch.ge(Y, luma_threshold)  # [h, w]

        # Make a copy of the image with just the bright parts.
@@ -349,7 +351,7 @@ class Postprocessor:
                   magnitude: float = 0.1,
                   sigma: float = 0.0,
                   name: str = "alphanoise0") -> None:
-        """[dynamic] Dynamic noise to alpha channel. A cheap alternative to luma noise.
+        """[dynamic] Dynamic noise to alpha channel.

        `magnitude`: How much noise to apply. 0 is off, 1 is as much noise as possible.

@@ -382,6 +384,48 @@ class Postprocessor:
        base_magnitude = 1.0 - magnitude
        image[3, :, :].mul_(base_magnitude + magnitude * noise_image)

+    def lumanoise(self, image: torch.tensor, *,
+                  magnitude: float = 0.1,
+                  sigma: float = 0.0,
+                  name: str = "lumanoise0") -> None:
+        """[dynamic] Dynamic noise to luminance, without touching colors or alpha.
+
+        Based on converting `image` from RGB to YUV, noising it there, and converting back.
+
+        `magnitude`: How much noise to apply. 0 is off, 1 is as much noise as possible.
+
+        `sigma`: If nonzero, apply a Gaussian blur to the noise, thus reducing its spatial frequency
+                 (i.e. making larger and smoother "noise blobs").
+
+                 The blur kernel size is fixed to 5, so `sigma = 1.0` is the largest that will be
+                 somewhat accurate. Nevertheless, `sigma = 2.0` looks acceptable, too, producing
+                 square blobs.
+
+        `name`: Optional name for this filter instance in the chain. Used as cache key.
+                If you have more than one `alphanoise` in the chain, they should have
+                different names so that each one gets its own cache.
+
+        Suggested settings:
+            Scifi hologram:   magnitude=0.1, sigma=0.0
+            Analog VHS tape:  magnitude=0.2, sigma=2.0
+        """
+        # Re-randomize the noise image whenever the normalized frame changes
+        if self.lumanoise_last_image[name] is None or int(self.frame_no) > int(self.last_frame_no):
+            c, h, w = image.shape
+            noise_image = torch.rand(h, w, device=self.device, dtype=image.dtype)
+            if sigma > 0.0:
+                noise_image = noise_image.unsqueeze(0)  # [h, w] -> [c, h, w] (where c=1)
+                noise_image = torchvision.transforms.GaussianBlur((5, 5), sigma=sigma)(noise_image)
+                noise_image = noise_image.squeeze(0)  # -> [h, w]
+            self.lumanoise_last_image[name] = noise_image
+        else:
+            noise_image = self.lumanoise_last_image[name]
+        base_magnitude = 1.0 - magnitude
+        image_yuv = rgb_to_yuv(image[:3, :, :])
+        image_yuv[0, :, :].mul_(base_magnitude + magnitude * noise_image)
+        image_rgb = yuv_to_rgb(image_yuv)
+        image[:3, :, :] = image_rgb
+
    # --------------------------------------------------------------------------------
    # Lo-fi analog video

@@ -654,7 +698,7 @@ class Postprocessor:
            strength_field = strength  # just a scalar!

        # Desaturate, then apply tint
-        Y = 0.2126 * R + 0.7152 * G + 0.0722 * B  # HDTV luminance (ITU-R Rec. 709)  -> [h, w]
+        Y = luminance(image[:3, :, :])  # -> [h, w]
        Y = Y.unsqueeze(0)  # -> [1, h, w]
        tint_color = torch.tensor(tint_rgb, device=self.device, dtype=image.dtype).unsqueeze(1).unsqueeze(2)  # [c, 1, 1]
        tinted_desat_image = Y * tint_color  # -> [c, h, w]
@@ -693,12 +737,16 @@ class Postprocessor:

    def scanlines(self, image: torch.tensor, *,
                  field: int = 0,
-                  dynamic: bool = True) -> None:
+                  dynamic: bool = True,
+                  channel: str = "Y") -> None:
        """[dynamic] CRT TV like scanlines.

        `field`: Which CRT field is dimmed at the first frame. 0 = top, 1 = bottom.
        `dynamic`: If `True`, the dimmed field will alternate each frame (top, bottom, top, bottom, ...)
                   for a more authentic CRT look (like Phosphor deinterlacer in VLC).
+        `channel`: One of:
+                     "Y": darken the luminance (converts to YUV and back, slower)
+                     "A": darken the alpha channel (fast, but makes the darkened lines translucent)

        Note that "frame" here refers to the normalized frame number, at a reference of 25 FPS.
        """
@@ -706,5 +754,10 @@ class Postprocessor:
            start = (field + int(self.frame_no)) % 2
        else:
            start = field
-        # We should ideally modify just the Y channel in YUV space, but modifying the alpha instead looks alright, and is much cheaper.
-        image[3, start::2, :].mul_(0.5)
+        if channel == "A":  # alpha
+            image[3, start::2, :].mul_(0.5)
+        else:  # "Y", luminance
+            image_yuv = rgb_to_yuv(image[:3, :, :])
+            image_yuv[0, start::2, :].mul_(0.5)
+            image_rgb = yuv_to_rgb(image_yuv)
+            image[:3, :, :] = image_rgb